Solution of the 3D density-driven groundwater flow problem with uncertain porosity and permeability

As groundwater is an essential nutrition and irrigation resource, its pollution may lead to catastrophic consequences. Therefore, accurate modeling of the pollution of the soil and groundwater aquifer is highly important. As a model, we consider a density-driven groundwater flow problem with uncertain porosity and permeability. This problem may arise in geothermal reservoir simulation, natural saline-disposal basins, modeling of contaminant plumes, and subsurface flow. This strongly nonlinear time-dependent problem describes the convection of the two-phase flow. This liquid streams under the gravity force, building so-called"fingers". The accurate numerical solution requires fine spatial resolution with an unstructured mesh and, therefore, high computational resources. Consequently, we run the parallelized simulation toolbox \myug with the geometric multigrid solver on Shaheen II supercomputer. The parallelization is done in physical and stochastic spaces. Additionally, we demonstrate how the \myug toolbox can be run in a black-box fashion for testing different scenarios in the density-driven flow. As a benchmark, we solve the Elder-like problem in a 3D domain. For approximations in the stochastic space, we use the generalized polynomial chaos expansion. We compute the mean, variance, and exceedance probabilities of the mass fraction. As a reference solution, we use the solution, obtained from the quasi-Monte Carlo method.


Motivation
This work is a 3D extension of our previous 2D work [47]. The problem setup is the same; the difference is only in the aquifer. In this work, we consider two 3D aquifers. These new 3D geometries contain a much larger number of degrees of freedom (DoFs), and, therefore, numerical experiments require significantly more computational resources.
Accurate prediction of the contamination of the groundwater is highly essential. Certain pollutants are soluble in water and can leak into groundwater systems, such as seawater into coastal aquifers or wastewater leaks. Indeed, some pollutants can change the density of a fluid and induce density-driven flows within the aquifer. This causes faster propagation of the contamination due to convection. Thus, simulation and analysis of this density-driven flow play an important role in predicting how pollution can migrate through an aquifer [28,43].
In contrast to the transport of pollution by molecular diffusion, convection due to density-driven flow is an unstable process that can undergo quite complicated patterns of distribution. The Elder problem is a simplified but comprehensive model that describes the intrusion of salt water from a top boundary into an aquifer [23,24,25,79]. The evolution of the concentration profile is typically referred to as fingering. Note that due to the nonlinear nature of the model, the distribution of the contamination strongly depends on the model parameters, so that the system may have several stationary solutions [40].
Hydrogeological formations typically have a complicated and heterogeneous structure, and geological media may consist of layers of porous media with different porosities and permeability coefficients [68,73]. Difficulty in specifying hydrogeological parameters of the media, as well as measuring the position and configuration of the layers, gives rise to certain errors. Typically, the averaged values of these quantities are used. However, due to the nonlinearities of the flow model, the averaging of the parameters does not necessarily lead to the correct mathematical solution. Uncertainties arise from various factors, such as inaccurate measurements of the different parameters and the inability to measure parameters at each spatial location at any given time. To model the uncertainties, we can use random variables, random fields, and random processes. However, uncertainties in the input data spread through the model and make the solution uncertain too. Thus, an accurate estimation of the output uncertainties is crucial.
Many techniques are used to investigate the propagation of uncertainties associated with porosity and permeability into the solution, such as classical Monte Carlo sampling. Although Monte Carlo sampling is dimensionindependent, it converges slowly and therefore requires a lot of samples, and these simulations may be timeconsuming and costly. Alternative, less costly techniques use collocations, sparse grids, and surrogate methods, each with their advantages and disadvantages. But even advanced techniques may require hundreds to thousands of simulations and assume a certain smoothness in the quantity of interest. Our simulations contain up to 4.5MI spatial mesh points and 1000-3000 time-steps and are therefore run on a massive parallel cluster. Here, we develop methods that require much fewer simulations, from a few dozen to a few hundred.
Perturbation methods, which decompose the quantity of interest (QoI) with respect to random parameters in a Taylor series, were considered and compared in [19].
Here, we use the well-established generalized polynomial chaos expansion (gPC ) technique [82], where we compute the gPC coefficients by applying a Clenshaw-Curtis quadrature rule [4]. We use gPC to compute QoIs such as the mean, the variance, and the exceedance probabilities of the mass fraction (in this case, a saltwater solution). We validate our obtained results using the quasi-Monte Carlo approach. Both methods require the computation of multiple simulations (scenarios) for variable porosity and permeability coefficients.
To the best of our knowledge, no other reported works have solved the Elder's problem [23,79] with uncertain porosity and permeability parameters using gPC .
Fingers in a highly unstable free convective flow have been studied in [84]. Overviews of the uncertainties in modeling groundwater solute transport [13] and modeling soil processes [77] have been performed, as well as reconnecting stochastic methods with hydrogeological applications [9], which included recommendations for optimization and risk assessment. Fundamentals of stochastic hydrogeology, an overview of stochastic tools and accounting for uncertainty have been described in [72].
Recent advances in uncertainty quantification, probabilistic risk assessment, and decision-making under uncertainty in hydrogeologic applications have been reviewed in [76], where the author reviewed probabilistic risk assessment methods in hydrogeology under parametric, geologic, and model uncertainties. Density-driven vertical transport of saltwater through the freshwater lens has been modeled in [66].
The quantification of uncertainties in stochastic PDEs can pose a great challenge due to the possibly large number of random variables involved, and the high cost of each deterministic solution.
For problems with a large number of random variables, the methods based on a regular grid (in stochastic space) are less preferable due to the high computational cost. Instead, methods based on a scattered sampling scheme (MC, qMC) give more freedom in choosing the sample number N and protect against sample failures. The MC quadrature and its variance-reduced variants have dimension-independent convergence O(N − 1 2 ), and qMC has the worst case convergence O(log M (N )N −1 ), where M is the dimension of the stochastic space [54]. Collocations on sparse or full grids are affected by the dimension of the integration domain [2,60,61]. In this work, we use the Halton rule to generate quasi-Monte Carlo quadrature points [12,39]. A numerical comparison of other qMC sequences has been described in [67].
The construction of a gPC -based surrogate model [55] is similar to computing the Fourier expansion, where only the Fourier coefficients need to be computed. Some well-known functions, such as multivariate Legendre, Hermite, Chebyshev, and Laguerre functions, are taken as a basis [82]. These functions should possess some useful properties, e.g., orthogonality. Usually, a small number of quadrature points are used to compute these coefficients. Once the surrogate is constructed, its sampling could be performed at a lower cost than a sampling of the original stochastic PDE.
To tackle the high numerical complexity, we implement a two-level parallelization. First, we run all available quadrature or sampling points (also say scenarios) in parallel, with each scenario on a separate computing node. Second, each scenario is computed in parallel on 2 − 32 cores.
This work is structured as follows: In Section 2, we outline the model of the density-driven groundwater flow in porous media and the numerical methods for this type of problem. We describe the stochastic modeling, integration methods, and the generalized polynomial chaos expansion technique in Section 3. We present details of the parallelization of the computations in Section 4. Our multiple numerical results, described in Section 5, demonstrate the practical performance of the parallelized solver for the Elder-type problems with uncertainty coefficients in 3D computational domains. We conclude this work with a discussion in Section 6.

Our contribution
We applied the gPC expansion to approximate the solution of the density driven flow. This is a time-dependent, nonlinear, second order differential equation. We estimated propagation of input uncertainties in the porosity and permeability into the mass fraction.

Density-driven groundwater flow
The groundwater in the porous medium of the aquifers is subjected to gravitation. For this, any spatial variation of its density induces a flow. In this work, we regard the dependence of the density on the mass fraction of the dissolved salt. Traditional approaches for modelling of this system are described in [5,6,7,20,66]. In this section, we briefly review the model to introduce the notation.
We consider a domain D ⊂ R d , d = 3 consisting of two phases: the immobile solid porous matrix and a mobile liquid in its pores. We denote the porosity of the matrix by φ : D → R and its permeability K : D → R d×d and the permeability by K. The liquid is a solution of N aCl in water. The mass fraction of the brine (a saturated saltwater solution) in the liquid phase is c(t, x) : [0, +∞) × D → [0, 1]. The density liquid phase is ρ = ρ(c) and its viscosity µ = µ(c).
The motion of the liquid phase through the solid matrix is characterized by the Darcy velocity q(t, x) : [0, +∞) × D → R d . Mass conservation of the liquid phase and the salt can be written as where the tensor field D represents the molecular diffusion and the mechanical dispersion of the salt. For the velocity q, Darcy's law is used: Here, p = p(t, x) : [0, +∞) × D → R is the hydrostatic pressure and g = (0, . . . , −g) T ∈ R d the gravity vector.
In this work, we consider a special case of this general model. We assume that the porous medium is isotropic and characterized by the scalar permeability For the density, we set where ρ 0 and ρ 1 are the densities of pure water and the brine, respectively. Note that c ∈ [0, 1], where c = 0 corresponds to pure water and c = 1 to the saturated solution. The viscosity is assumed to be constant. Finally, the dispersion is neglected: where D m is the molecular diffusion coefficient. Fields φ(x) and K(x) will depend on the stochastic variables. Values of the other parameters used in this work are presented in Table 1.
In our numerical experiments, variants of the Elder problem [23,79] are considered. In it, concentrated solution intrudes the upper boundary into an aquifer filled with pure water. Due to the difference in the density, a complicated, unstable flow arises. This leads to a specific distribution of the mass fraction typically described as "fingering" [40]. The time evolution of the mass fraction and the pressure in (1-3) is determined by the initial conditions for c. Note that the system (1-3) is nonlinear, and the model may have several stationary states [40]. Any changes of the porosity and permeability fields may have an essential effect on the solution: The same initial and boundary data may correspond to principally different asymptotic behaviors. The influence of this factor can be estimated by the application of the stochastic models.
We consider two spatial 3D domains, cf. For the spatial discretization of (1)-(3), a vertex-centered FV scheme is used. Degrees of freedom (DoFs) for c and p are located in the grid nodes of the primary mesh. The domain in Fig. 1(a) is covered by a grid of 524,288 hexahedra, so that the total number of the DoFs is 1,098,306. For the domain in Fig. 1(b), we use an unstructured mesh of 241,152 tetrahedra with 89,586 DoFs.
Resolution of the spatial and temporal discretizations play an important role in the accuracy of the simulations. The distribution of the hydrogeological parameters of the porous medium should be represented correctly. Besides that, smoothing the flow on a coarse grid due to the numerical diffusion can lead to a loss of some phenomena and therefore reduce the accuracy of the uncertainty quantification. For this, the grid convergence of the simulations w.r.t. the spatial mesh has been tested. We ran our simulation on the twice regularly refined spatial mesh for geometry from Fig. 1(b). It consists of 15,433,728 cells and has 2,822,552 DoFs. The obtained solution was similar to the solution, computed on a coarser mesh mentioned above. Thus the further refinement does not change the behaviour of the solution essentially. To achieve reasonable computation times, parallelization of the evaluation of every realization is performed.

Stochastic modeling of porosity, permeability and mass fraction
Due to lack of knowledge or inaccurate measurements, the two input parameters porosity and permeability are unknown. One of the approaches to deal with this situation is to say that both input parameters are uncertain. For example, we can introduce a prior distribution for these two values. We also can, based on expert knowledge, assume that the mean and the variance values are known.
In the following we model the porosity (φ(x)) by a random field (φ(x, θ)). We also assume that the permeability (K) is a function of φ as shown in (7) and (8): Since parameters are uncertain the solution c is uncertain too, and this solution is a function of φ and K. Here we assume K to be isotropic. Usually the distribution of φ(x, θ), x ∈ D, is non-trivial or even very complicated. Therefore, it is comfortable from the numerical point of view, to represent this complicated random field in some easy to compute basis, e.g. with Gaussian random variables. Often, many Gaussian random variables are required. We introduce θ = (θ 1 , ..., θ M , ...), where θ i are Gaussian random variables. The dependence (7) is specific for every material and there is no a general law. In our experiments, we use a Kozeny-Carman-like dependence [18]: where the scaling factor κ KC is chosen to satisfy the equality K(E (φ))I = E (K). Further parameters of the standard Elder problem are listed in Table 1.

Solution of the stochastic flow model
We start with sampling random variables {θ i } and computing the realisations of porosity φ i (x) = φ(x, θ i ) and permeability K i (x) = K(x, θ i ), and then solving equations (1)(2). To compute the solution we used plugin d 3 f of the simulation framework ug4 , [69,78]. This framework presents a flexible tool for the numerical solution of non-stationary and nonlinear systems of PDEs, and is parallelized and highly scalable. Equations 1-2 are discretized by a vertex-centered finite-volume method on unstructured grids in the geometric space, as presented in [30]. In particular, the consistent velocity approach is used for the approximation of the Darcy velocity (3), [29,31]. The implicit Euler scheme is used for the time discretization. The implicit time-stepping scheme is chosen for its unconditional stability, as the velocity depends on the unknowns of the system. This is especially important for variable coefficients as it is difficult to predict the range of the variation of the velocity in the realizations.
In this discretization, we obtain a large sparse system of nonlinear algebraic equations in every time step, and we solve it using the Newton method with a line search method. The sparse linear system, which appears in the nonlinear iterations, is solved by the BiCGStab method [85] with the multigrid preconditioning (V-cycle, [37]), which proved to be very efficient in this case. In the multigrid cycle, we use the ILU β -smoothers [38].
Here Θ denotes a sample space, B a σ-algebra on Θ, and P a probability measure on (Θ, B). Random variable θ : Θ → R M with finite variance can be represented in terms of the polynomial chaos expansion, e.g. as a multivariate Hermite polynomial of Gaussian RVs, [80]. Alternatively to Hermite polynomials, other orthogonal polynomials (Legendre, Chebyshev, Laguerre) [1,82,83], or splines, for instance, can be used. We assume {Ψ β } β∈J is a basis of S = L 2 (Θ, P). The cardinality of the multi-index set J is infinite, therefore, we truncate it and obtain a finite set J M,p and a subspace S M,p . The solution c(t, x, θ) lies in the tensor product space where the initial problem is solved. After a discretization, we obtain c ∈ R n ⊗ S M,P ⊗ R nt , where n is the number of basis functions in the physical space, |J M,p | dimension of the stochastic space, and n t dimension of the temporal space.
The empirical mean value and the variance of c(t, x, θ) can be computed as follow with some quadrature points θ i and weights w i . Here c(t, x, θ i ) is the solution evaluated at fixed point θ = θ i ∈ R M . The sampled variance is computed as follow Other well-known methods to compute the mean and the variance are quasi-Monte Carlo, Monte Carlo and multi-level Monte Carlo methods.
Decomposition (11) can be understood as a response surface for c(t, x, θ). As soon as the response surface is built, the value c(t, x, θ) can be evaluated for any θ almost for free (only by evaluating the polynomial (11)). It can be very practical if 10 6 samples of c(t, x, θ) are needed, for example. Since Legendre polynomials are L 2 -orthogonal, the coefficients c β (t, x) can be computed by projection: where w i are weights and θ i quadrature points, defined, for instance, by a Gauss-Legendre integration rule. Once all gPC coefficients are computed we substitute them into (11). For estimates of truncation and aliasing errors, see [16]. There are different strategies to truncate the multi-index set J to J M,p . In the multi-index set J M,p we keep only M RVs, i.e., θ = (θ 1 , . . . , θ M ) and p i is the monomial degree w.r.t. the variable θ i . For the total multi-variate polynomial degree p we pose one of the following condition a) Remark 1. If only the variance and the mean values are of interest, we recommend to use the collocation method with some sparse or full-tensor grid. If some additional statistics are required, such as sensitivity analysis, parameter identification, or computing cdf or pdf, we recommend to compute the gPC expansion.
Quadrature grids for computing multi-dimensional integrals can be constructed from products of 1D integration rules. To keep computational costs to a minimum, a sparse grid technique can be used for certain class of smooth functions [11]. To compute (14), we apply the well-known Clenshaw Curtis quadrature rules [15].

Computing probability density functions
After we calculated the gPC surrogate c(t, x, θ) in (12), we can compute the probability density functions, exceedance probabilities, and quantilies in the selected points. For this, one should evaluate the multi-variate polynomial c(t, x, θ) in a sufficiently large number of points θ The random variable c(θ) can be sampled, e.g., N s = 10 6 times (no additional extensive simulations are required). The obtained sample can be used to evaluate the required statistics, and the probability density function. The exceedance probability for some threshold c * can be estimated as follows: Having a sufficiently large sample set makes it possible to estimate quantiles [47].

Numerical errors
Applying gPC approximation, we introduce the truncation and approximation errors [16,17,26,74]. Truncating gPC coefficients {c β : β ∈ J c }, we introduce the truncation error Additionally, we approximate the coefficients c β (t, x) by c β (t, x). This introduces the approximation error The sum of both errors is Remark 2. Spatial resolution plays an important role in the simulations. Thus, the distribution of the parameters of the porous medium should be represented correctly. Furthermore, smoothing the flow on a coarse grid due to numerical diffusion can lead to a loss of some phenomena and therefore reduce the accuracy of the uncertainty quantification. Thus we must cover D with sufficiently fine grids and use small time steps. To achieve reasonable computation times, this also assumes parallelization of the evaluation of every realization.

Parallelization
The computation of the mean value, variance, and other statistics are done in parallel. There are two levels of parallelization. First, we compute the solution in all quadrature or sampling points in parallel no communication is needed). Second, the solution in each single quadrature point is computed also in parallel (communication is needed). We performed computations on a Cray XC40 parallel cluster Shaheen II provided by the King Abdullah University of Science and Technology (KAUST) in Saudi Arabia. Shaheen II has 6174 nodes with 2 Intel Haswell 2.3 GHz CPUs per node, and 16 processor cores per CPU. Every node has 128 GB RAM.
On the first level, we allocated, depending on the experiment, up to 600 computing nodes. It allowed us to compute (up to) 600 qMC simulations simultaneously. All these simulations are independent, and, therefore, no communication between nodes was required. The total computing time is equal to the time, required for computing one simulation. On the second level, on each node, we allocated 32 computing cores to resolve each single simulation. Thus, in total, we run our code on up to 600 × 32 = 19200 parallel cores. If a simulation is large or requires a lot of memory, we can allocate more than one node for it. ug4 allows allocating all available nodes (6174) for solving just one expensive simulation. Theoretically, ug4 allows using all available nodes and cores, i.e., 6174 × 32 = 197,568 cores.
Parallelization of the framework ug4 is based on the distribution of the spatial grid between the processes and is implemented using MPI [69]. As the computation of the realizations involves the nonlinear iterations and the solution of the large sparse linear systems (see Section 2.3), quite intensive communication is required. Shaheen II used the SLURM system to manage parallel jobs, and the job arrays were used for the concurrent computations of the realizations. This was the most time-consuming phase, which required hours of computing time.

Solving the deterministic problem
The deterministic solver ug4 uses the multigrid solver with a hierarchy of L nested spatial grids in D. There are N nodes on each level = 1, . . . , L. Each grid is distributed over all available processors on the node (32 in the tests below).
Equations 1-2 are discretized by a vertex-centered finite-volume method on unstructured grids in space, as presented in [30]. In particular, we use the "consistent velocity" for the approximation of the Darcy velocity (3), cf. [29,31]. The implicit Euler scheme is used for the time discretization. The choice of the implicit time stepping scheme is motivated by its unconditional stability, as the velocity depends on the unknowns of the system.
After discretization one obtains a large sparse system of nonlinear algebraic equations in every time step. It is solved by the Newton method with a line search method. The sparse linear system, which appears in the nonlinear iterations, is solved by the BiCGStab method (cf. [85]) with the multigrid preconditioning (V-cycle, cf. [37]) which proved to be very efficient in this case. In the multigrid cycle, we use the ILU β -smoothers [38]. The matrices are assembled as the Jacobians of the discretized nonlinear systems for each grid. On each step of the Newton method the inversion of the Jacobian is computed. Developers of ug4 also efficiently solve the problem of optimal distribution of the hierarchy of grids onto parallel processes.
Software. ug4 is a novel flexible software system for simulating PDE based models on high performance parallel clusters [69,78]. The software toolbox ug4 has been developed since the early 1990's. This software framework has successfully been applied to a variety of problems such as density-driven and thermohaline flow in porous media, Navier-Stokes equation, drug diffusion through human skin, level-set methods. The aim of the software framework ug4 is to solve the above mentioned problems on adaptive hybrid grid hierarchies in 2D and 3D dimensions. Special care is taken to parallel performance issues, scalability, linear algebra algorithms, and data structures. In addition, ug4 is specially designed for user-friendly handling and functionality is made available to non-programming users by the usage of scripts and graphical representations. The numerical solution of the systems (1-2) has been performed using the D3F plugin of the simulation software package ug4 , cf. [69,78].

Parallelepiped with 3 RVs
We consider an experiment with three random variables ξ = (ξ 1 , ξ 2 , ξ 3 ) and with the following porosity coefficient Note that in the present problem setting, the flow field is very complicated. The main process tends to represent the same behaviour as in the 2D version of the Elder problem. The heavy concentrated solution from the top boundary falls down and replaces the pure water located initially at the bottom. This pure water is pressed up. As the diffusion is very small, the transport of the salt by these opposite flows creates a non-trivial pattern of the concentration field typically described as "fingering". The particular pattern depends on the profile of the boundary conditions for c and the initial flow. In our case, we see 5 "fingers" (4 at the corners and one at the center). As we see, the isosurfaces {x : c(x) = 0.5} are almost identical for both the methods.  Fig. 1(a) The same holds for the variance.   Fig. 1(a).

Elliptical cylinder with 3 RVs
In this example we consider a cylindrical domain (see Fig. 1b). The goal is to see the influence of the computing domain on the solution. As we can see, the fingers have different shapes and numbers.
The porosity coefficient has three layers and is defined with three RVs as follows: φ(t, x, θ) = 0.1 + 0.05 · c 0 · ξ 1 x 600 cos πx 300 + ξ 2 sin πy 150 + ξ 3 cos πx 300 sin πy 150  In Fig. 6 we present a comparison of the variance of c, computed via qMC (200 simulations) and gPC of order p = 4. All (m+p)! m!p! = 7! 3!4! =35 gPC coefficients are computed with a full Clenshaw-Curtis quadrature, containing 125 quadrature points. In Figures 6a and 6b the cutting plane has the normal vector (1, 0, 0), and in Fig. 6c  and 6d vector (0, 1, 0). The variances, computed via qMC and gPC lie in the intervals Var[c] qM C ∈ [0, 0.14], and Var[c] gP C ∈ [0, 0.13] respectively. The small difference is not really visible in the pictures. Thus, we conclude that the gPC surrogate model, computed on the full Clenshaw-Curtis quadrature grid can be used for approximating the mean and the variance of c.
To demonstrate more similarities between the solutions, computed via qMC and gPC we visualize isosurfaces of the variance of the mass fraction in the cylindrical 3D reservoir (Figures 7 and 8).  Figure 7c shows the comparison of both isosurfaces. As we can see the results are very similar, the isosurface computed via qMC is slightly larger.
In Figure 8 we demonstrate two isosurfaces {x : Var[c](x) = 0.07} computed via a) qMC method, Fig. 8a, and b) gPC of order 4 surrogate, Fig. 8b. The Figure 8c shows the comparison of both isosurfaces. As we can see the results are very similar, the isosurface computed via qMC is slightly larger.

Time evolution of c(t, x) and Var[c](t, x)
This is again an experiment with φ as in (24)-(25). The gPC coefficients were computed with 125 Gauss-Legendre quadrature points. Figure 10 shows evolution of the mean of the mass fraction in the cylindrical 3D reservoir after a) 0, b) 0.55, c) 1.1, d) 2.2 years. The cutting plane is (150, y, z). One can clearly see different stages of the "finger" building. Figure 11 shows the variance of the mass fraction in the cylindrical 3D reservoir after {0.55, 1.1, 2.2} years (after 100, 200, and 400 time steps, respectively).

Three layers example with cylindrical domain
The porosity coefficient inside of each level is random and is defined by the following equation

Best practices
After numerous numerical tests, we created the best practice guide as below: 1. The number of uncertain variables to describe the porosity coefficient depends on the model and assumptions. We recommend to start with 1-3 variables to understand the phenomena better. We note that a heterogeneous porosity field in 3D may require hundreds of random variables. That means thousands of gPC coefficients. This is computationally un-affordable unless one uses some advance low-rank tensor techniques [21] or multi-scale/homogenization strategy. In this work, we considered three random variables. The more random variables are present, the harder it could be to interpret the results, and additional sensitivity analysis may be required. 2. It is assumed a smooth dependence of the output QoI on the input uncertain parameters. We recommend to check it at least visually before computing gPC . 3. The gPC -based surrogate, for the cases when the variance of the porosity is not so high, showed reliable results. For the validation, one can use a qMC method. 4. We were not able to get a correct variance of c with the gPC method in tests where the variance of the porosity was large. 5. We recommend the maximal polynomial order in gPC to be p = 2, 3, 4 or 5. With our computational budget, we failed to get the correct variance of c for p > 5. Probably the quadrature rule was not good enough to catch oscillatory behavior of gPC polynomials. This is a general known drawback of using global polynomials. 6. The coefficients of the gPC surrogate require calculating high-dimensional integrals. For stochastic dimensions 4 and higher, we recommend to use Gauss-Legendre (or Clenshaw-Curtis) sparse grid. For dimensions 1-3, we recommend to use a full grid. 7. The polynomial order in gPC is 2-4. Otherwise, the number of simulations will be too large. 8. The total numerical cost could be reduced by choosing the spatial and time step sizes adaptive, depending on the random perturbation. But in our implementation, it is hard to it do on the fly. Therefore, we recommend "conservative" settings, which are the same for all random simulations. A large time or large spatial step sizes may result in lack of convergence.

Conclusion
We solved the density driven groundwater flow problem with uncertain porosity and permeability. An accurate solution of this time-dependent and non-linear problem is impossible due to the presence of natural uncertainties in the porosity and permeability in the reservoir. Therefore, we estimated the mean value and the variance of the solution, as well as the propagation of uncertainties from the random input parameters to the solution. For this, we, first, computed a multi-variate polynomial approximations (gPC ) to the solution and then used it to estimate the required statistics. Utilizing the gPC method allows us to reduce the computational cost compare of the classical Monte Carlo method. gPC assumes that the output function c(t, x, θ) is square-integrable and smooth w.r.t uncertain input variables θ. We considered two different aquifers -a solid parallelepiped and a solid elliptic cylinder. One of our goals was to see how the domain geometry influences the shape of fingers.
Since the considered problem is nonlinear, a high variance in the porosity may result in totally different solutions, for instance, the number of fingers, their intensity, the shape, the propagation time, and the velocity may strongly variate.
From the implementation point of view, we have demonstrated how to use the parallel multigrid solver ug4 as a black-box solver. An additional novelty of this work is that all algorithms for uncertainty quantification are implemented on a distributed memory system, where all solutions and gPC coefficients are distributed over multiple nodes. We saw great potential in using the ug4 library as a black box solver for quantification of uncertainties. The results are reproducible, and the code is available online on GitHub repository 1 .
The number of cells in the presented experiments varied from 241,152 till 15,433,728 for the cylindrical domain and from 524,288 till 4,194,304 for parallelepiped. The maximal number of parallel processing units was 800 × 32, where 800 is the number of parallel nodes, and 32 is the number of computing cores on each node. The computing time varied from 2 hours for the coarse mesh to 23 hours for the finest mesh (for one simulation).
Although this work let many questions open, it is an important step towards the investigation of the density driven flow phenomena. Potential further applications are an estimation of the contamination risks, monitoring the quality of drinking water, modeling of the seawater intrusion into coastal aquifers, radioactive waste disposal, and contaminant plumes.
As a next step, we will integrate the ug4 multi-grid solver in the Multi Level Monte Carlo framework with the idea to compute mostly samples on coarse grids and just a few on a fine grid.
Another idea is to infer the (unknown) porosity coefficient. We assume that there are some noisy measurements of c at some locations {x 1 , x 2 , . . . , x k } are available. Then the prior probability density function of c could be improved via Bayesian update formula [50,57,58,70,71]. Herewith, the solution c(x * ) at a point x * (or at k points or in a sub-domain) could be computed with a much smaller numerical cost and with the smaller memory requirement as the whole solution in the whole domain [45,46].
The scalar product is