A Surrogate Modelling Approach Based on Nonlinear Dimension Reduction for Uncertainty Quantification in Groundwater Flow Models
 414 Downloads
Abstract
In this paper, we develop a surrogate modelling approach for capturing the output field (e.g. the pressure head) from groundwater flow models involving a stochastic input field (e.g. the hydraulic conductivity). We use a Karhunen–Loève expansion for a lognormally distributed input field and apply manifold learning (local tangent space alignment) to perform Gaussian process Bayesian inference using Hamiltonian Monte Carlo in an abstract feature space, yielding outputs for arbitrary unseen inputs. We also develop a framework for forward uncertainty quantification in such problems, including analytical approximations of the mean of the marginalized distribution (with respect to the inputs). To sample from the distribution, we present Monte Carlo approach. Two examples are presented to demonstrate the accuracy of our approach: a Darcy flow model with contaminant transport in 2d and a Richards equation model in 3d.
Keywords
Groundwater flow models Uncertainty quantification Surrogate model Karhunen–Loève expansion Manifold learning1 Introduction
Groundwater contamination, caused by landfills, wastewater seepage, hazardous chemical spillage, dumping of toxic substances or discharge from industrial processes (Karatzas 2017), is a major concern for both public and environmental health. Understanding the mechanisms and predicting the transport of contaminants through soils is therefore an important topic in groundwater flow modelling.
The control of groundwater quality relies on knowledge of the transport of chemicals to the groundwater through soil. The efficacy of remedial treatment and management of contaminated land depends on the accuracy of models used for the simulation of flow and solute transport. Modelling and simulation of hydraulic phenomena in soil are, however, hampered by the complex and heterogeneous nature of soils, as well as the broad range of influential factors involved. A number of simplified models have been developed to describe the smallscale physical, chemical (Boi et al. 2009; Foo and Hameed 2009; Vomvoris and Gelhar 1990), and biological mechanisms (Schfer et al. 1998; Barry et al. 2002) that affect unsaturated flow and contaminant transport.
A current challenge in modelling solute transport in soils lies in characterizing and quantifying the uncertainties engendered by the natural heterogeneity of the soil. Such uncertainty can be vital for decisionmaking. Despite strong evidence from fieldscale observations and experimental studies in relation to the effects of soil heterogeneity on the transport of contaminants (AlTabbaa et al. 2000; Kristensen et al. 2010), relatively few numerical models incorporate the effects of this uncertainty (Feyen et al. 1998; Aly and Peralta 1999; Sreekanth and Datta 2011a, 2014; Herckenrath et al. 2011).
Monte Carlo (MC) sampling is the default method for investigating uncertainties in a system (e.g. propagating uncertainty in the inputs), including in the context of groundwater flow modelling (Fu and GomezHernandez 2009; Paleologos et al. 2006; Kourakos and Harter 2014; Maxwell et al. 2007; Herckenrath et al. 2011). MC estimates are extracted from multiple runs of the model using different realizations of the inputs, sampled from some distribution. While convergence is guaranteed as the number of runs increases, the slow rate of convergence demands (typically) a few thousand runs in order to extract reliable estimates of the statistics. If the model is computationally expensive, such a bruteforce approach can be extremely timeconsuming or perhaps even infeasible (Maxwell et al. 2007). Analytical stochastic methods have also been employed (Gelhar and Axness 1983; Gelhar 1986). Such methods can be useful for conceptual understanding of the transport process but are not applicable to practical scenarios.
Such limitations and shortcomings could be resolved in theory by using surrogate models (also known as metamodels, emulators or simply surrogates) in place of the complex numerical codes; that is, computationally efficient approximations of the codes based on datadriven or reducedorder model (ROM) approaches. Surrogate models have been used in a limited number of groundwater flow modelling problems (Aly and Peralta 1999; Bhattacharjya and Datta 2005; Kourakos and Mantoglou 2009; Sreekanth and Datta 2011b; AtaieAshtiani et al. 2014) (we refer to Razavi et al. 2012; Ketabchi and AtaieAshtiani 2015 for reviews of the topic) and are typically based on artificial neural networks (ANNs) for approximating a small number of outputs within an optimization task. For example, Bhattacharjya and Datta used an ANN to approximate the salt concentration in pumped water at 8 pumping wells for 3 different times, in order to maximize the total withdrawal of water from a coastal aquifer while limiting the salt concentration (Bhattacharjya and Datta 2005). Similarly, Kourakos and Mantoglou used an ANN model to optimize 34 well pumping rates in a coastal aquifer (Kourakos and Mantoglou 2009).
Another popular surrogate modelling approach is the stochastic collocation method (Babuška et al. 2007) in which the approximate response is constrained to a subspace, typically spanned by a generalized Polynomial Chaos basis (Xiu and Karniadakis 2002). The coefficients in this basis are approximated via a collocation scheme. While these schemes yield good convergence rates, they scale poorly with the number of collocation points (Rajabi et al. 2015). Although sparse grid methods based on the Smolyak algorithm (Smolyak 1963) help to alleviate the increased computational burden, the resulting schemes are still severely limited by the input space dimensionality and tend to perform poorly with limited observations (Xiu and Hesthaven 2005; Xiu 2007; Nobile et al. 2008; Ma and Zabaras 2009).
When data are scarce, we may turn to statistical Bayesian approaches such as Gaussian process (GP) regression. GPs are stochastic processes used for inferring nonlinear and latent functions. They are defined as a families of normally distributed random variables, indexed in this case by the input variable(s). GPs were first used for surrogate models in the seminal papers of Currin et al. (1988) and Sacks et al. (1989). The first applications of GP surrogate models to uncertainty quantification can be found in O’Hagan and Kingman (1978). Kernel methods such as GP models are wellestablished tools for analysing the relationships between input data and corresponding outputs of complex functions. Kernels encapsulate the properties of functions in a computationally efficient manner and provide flexibility in terms of model complexity (the functions used to approximate the target function) though variation of the functional form and parameters of the kernel.
GPs excel when data are scarce since they make a priori assumptions with regard to the relationship between data points. Comparatively, ANNs make fewer a priori assumptions and as a result require much larger data sets; they are, therefore, infrequently used for uncertainty quantification tasks. In the context of groundwater flow, very few applications of GPs can be found (Bau and Mayer 2006; Hemker et al. 2008; Borgonovo et al. 2012), the most likely explanations for which are the difficulty in implementing multioutput GP models and the lack of available information on, and software for GP modelling in comparison with ANNs. Existing applications again deal with lowdimensional outputs; e.g. in Bau and Mayer (2006), the authors use a GP model to learn 4 well extraction rates for a pumpandtreat optimization problem.
Our aim in this paper is to develop a surrogate model for the values of a field variable in a groundwater flow model, e.g. the pressure, pressure head or flow velocity, at a high number of points in the spatial domain, in order to propagate uncertainty in a stochastic field input, e.g. the hydraulic conductivity. In such cases, simplified covariance structures (Conti and O’Hagan 2010) for the output space (response surface) or dimensionality reduction for the input and/or output space can be used. In Higdon et al. (2008) Higdon et al. use principal component analysis (PCA) to perform linear, nonprobabilistic dimensionality reduction on the response in order to render a GP model tractable (independent learning of a small number of PCA coefficients). Such linear approaches (PCA, multidimensional scaling, factor analysis) are applicable only when data lie in or near a linear subspace of the output space.
For more complex response surfaces, manifold learning (nonlinear dimensionality reduction) can be employed, using, for example, kernel principal component analysis (kPCA), diffusion maps (Xing et al. 2016 or isomaps Xing et al. 2015). In contrast, kPCA was used to perform nonlinear, nonprobabilistic dimensionality reduction of the input space in Ma and Zabaras (2011). This can useful when the input space is generated from observations (experimental data), but when the form is specified we can use linear dimension reduction methods such as the Karhunen–Loève (KL) expansion (Wong 1971).
In this paper we use manifold learning in the form of local tangent space alignment (LTSA) (Zhang and Zha 2004) to perform Bayesian inference (GP regression/emulation with Markov Chain Monte Carlo) in an abstract feature space and use an inverse (preimage) map to obtain the output field at a finite number of points for an arbitrary input. In contrast to diffusion maps, isomaps and kPCA, LTSA is a local method in that it approximates points on the manifold on localized regions (patches), rather than directly seeking a global basis for the feature space. This can potentially provide more accurate results, although this is of course dependent upon the sampling methodology for the points and the quality of the reconstruction mapping.
The aforementioned approach is combined with a Karhunen–Loève expansion for a lognormally distributed input field and a framework for UQ is developed. We derive analytical forms for the output distribution by pushing the feature space Gaussian distribution through a locally linear reconstruction map. Additionally, we derive analytical estimates of the moments of the predictive distribution via approximate marginalization of the stochastic input. To sample from the hyperparameter and signal precision posteriors, we employ a Hamiltonian Monte Carlo scheme and use MC sampling to approximately marginalize the stochastic input distribution. The accuracy of the approach is demonstrated via two examples: a linear, steadystate Darcy’s Law with a contaminant mass balance in a 2d domain (aquifer) and a timedependent Richards equation evaluated at a fixed time in a 3d domain. In both cases we consider a stochastic hydraulic conductivity input.
The rest of the paper is organized as follows. In Sect. 2 we provide a detailed problem statement and outline the proposed solution. In Sect. 3 we outline LTSA, and in Sect. 4 we outline GP regression. In Sect. 5 we provide full details of the coupling of the methods and we demonstrate how the approach can be used to perform UQ tasks. In Sect. 6 we present the examples and discuss the results.
2 Problem Statement
Consider a welldefined steadystate partial differential equation (PDE) with a scalar, isotropic random field input (e.g. a permeability or hydraulic conductivity), and a response (output) consisting of a scalar field, e.g. pressure head, concentration or flow velocity. We may generalize our approach to multiple or vector fields but in order to simplify the presentation we focus on a single scalar field. We can also apply the method we develop to dynamic problems by focusing on the spatial field at a given fixed time (the second example we present). For an arbitrary input field realization, solutions to the PDE are found using a numerical code (simulator, or solver) on a spatial mesh with \({k_{y}} \) fixed degrees of freedom, e.g. grid points in a finite difference grid, control volume centres in a finite volume mesh or spatial nodes in a finite element mesh combined with a nodal basis.
We denote the input field by \(K(\mathbf {x})\), where \(\mathbf {x}\in \mathcal{R}\subset \mathbb {R}^{d}\), \(d\in \{1,2,3\}\) denotes a spatial location and the notation makes explicit the spatial dependence. The model output (a scalar field) is denoted by \(u(\mathbf {x};K)\), i.e. it is a function of \(\mathbf {x}\) that is parameterized by \(K(\mathbf {x})\). The random input \(K(\mathbf {x})\) is defined on the whole of \(\mathcal{R}\) and therefore requires a discrete (finitedimensional) approximation in order to obtain a numerical solution. Let \(\mathbf {x}_k\in \mathcal{R}^d\), \(k=1,\ldots ,{k_{y}} \) be a set of nodes or grid points and suppose that the simulator yields discrete approximations \(\{u_k = u(\mathbf {x}_k;K)\}_{k=1}^{{k_{y}} }\) of the output field \(u(\mathbf {x};K)\) in each run. Our goal is to approximate these simulator outputs for an arbitrary K.
2.1 Input Model: Karhunen–Loève Expansion
We note that different methods, including different quadrature rules or the use of projection schemes and Nystrom methods (Wan and Karniadakis 2006) can be used to solve the eigenvalue problem (3), all of which lead to a generalized eigenvalue problem in place of (5) (Betz et al. 2014). For example, if the finite element method is used, we may express the eigenfunctions as \(w_j(\mathbf {x})=\sum _k l_{j,k}\psi _k\) in terms of the finite element basis \(\{\psi _k\}_{k=1}^{{k_{y}} }\) and perform a Galerkin projection of (3) onto \(\text{ span }(\psi _1,\ldots ,\psi _{{k_{y}} })\) to yield a generalized eigenvalue problem for \(\{\lambda _j\}_{j=1}^{{k_{y}} }\) and the undetermined coefficients \(\{l_{j,k}\}_{j,k=1}^{{k_{y}} }\) (Ghanem and Spanos 2003).
2.2 Statement of the Surrogate Model Problem
The high dimensionalities of the (original) input and output spaces pose great challenges for surrogate model development. The input space dimensionality can be reduced as described above. The intrinsic dimensionality of the output space is significantly lower than \({k_{y}} \) by virtue of correlations between outputs for different inputs, as well as physical constraints imposed by the simulator. This suggests that we treat \(\mathcal{Y}\) as a manifold and use manifold learning/dimensionality reduction to perform Bayesian inference on a lowdimensional (feature) space \(\mathcal{F}\) that is locally homeomorphic to \(\mathcal{Y}\). Below we introduce the manifold learning method employed, before recasting the emulation problem as one of inference in the feature space, together with a preimage (inverse) mapping to obtain solutions in \(\mathcal{Y}\) for arbitrary inputs \(\varvec{\xi }\).
3 Dimensionality Reduction and Manifold Learning: Feature Space Representations
Roughly speaking, a \({k_{z}} \)dimensional manifold \(\mathcal {Y}\) is a set for which all points can be parameterized by \({k_{z}} \) independent variables. A parameterization is called a coordinate system (or a chart) and it is not necessarily the case that a single coordinate system can describe the entire manifold. To characterize the manifold in such cases, we can introduce overlapping patches, each with its own system of (nonunique) coordinates.
Formally speaking, a smooth \({k_{z}} \)manifold is defined as a topological space \(\mathcal {Y}\) that is equipped with a maximal open cover \(\{U_{\alpha }\}_{\alpha \in \Gamma }\) consisting of coordinate neighbourhoods (or patches) \(U_{\alpha }\), together with a collection of homeomorphisms (coordinate charts) \(\phi _{\alpha }: U_{\alpha }\rightarrow \phi _{\alpha }(U_{\alpha }) \subset \mathbb {R}^{k_{z}} \) onto open subsets \(\phi _{\alpha }(U_{\alpha }) \subset \mathbb {R}^{k_{z}} \) such that \(\phi _{\alpha }(U_{\alpha }\cap U_{\beta })\) and \(\phi _{\beta }(U_{\alpha }\cap U_{\beta })\) are open in \(\mathbb {R}^{k_{z}} \); we say that \(\phi _{\alpha }\) and \(\phi _{\beta }\) are compatible. Moreover, the transition maps defining a change of coordinates \(\phi _{\beta } \circ \phi _{\alpha }^{1}\) are diffeomorphisms for all \(\alpha ,\beta \in \Gamma \).
Let \(\mathcal {A}=\{(U_{\alpha },\phi _{\alpha })\}_{\alpha \in \Gamma }\) be an atlas on \(\mathcal {Y}\) (\(\{U_{\alpha }\}_{\alpha \in \Gamma }\) is a cover and the \(\{\phi _{\alpha }\}_{\alpha \in \Gamma }\) are pairwise compatible). Two smooth curves \(\gamma _0,\gamma _1:\mathbb {R}\rightarrow \mathcal {Y}\) are called \(\mathbf {y}\)equivalent at a point \(\mathbf {y}\in \mathcal {Y}\) if for every \(\alpha \in \Gamma \) with \(\mathbf {y}\in U_{\alpha }\), we have \(\gamma _0(0)=\gamma _1(0)=\mathbf {y}\) and furthermore \((\text {d}/\text {d}t)_{t=0}\phi _{\alpha }(\gamma _0(t))=(\text {d}/\text {d}t)_{t=0}\phi _{\alpha }(\gamma _1(t))\). With this equivalence relation, the equivalence class of a smooth curve \(\gamma \) with \(\gamma (0)=v\) is denoted \([\gamma ]_p\) and the tangent space\(T_\mathbf {y}\mathcal {Y}\) of \(\mathcal {Y}\) at \(\mathbf {y}\) is the set of equivalence classes \(\{[\gamma ]_p:\gamma (0)=\mathbf {y}\}\). The tangent space is a \({k_{z}} \)dimensional vector space, which is seen more clearly by identifying \(T_\mathbf {y}\mathcal {Y}\) with the set of all derivations at \(\mathbf {y}\) [linear maps from \(C^\infty (\mathcal {Y})\) to \(\mathbb {R}\) satisfying the derivation (Liebnitz) property].
We assume that the output space \(\mathcal {Y}\supset \mathbf {Y}\) is a manifold of dimension \({k_{z}} \ll {k_{y}} \) embedded in \(\mathbb {R}^{k_{y}} \). Representations of points in \(\mathcal {Y}\) and corresponding representations in the feature or latent space \(\mathcal{F}\subset \mathbb {R}^{k_{z}} \) can be related by some smooth and unknown function \(\mathbf {f}: \mathcal{F} \rightarrow \mathcal {Y}\). Manifold learning is concerned with the reconstruction of \(\mathbf {f}\) and its inverse given data points on the manifold, whereas dimensionality reduction is concerned with the representation of given points in \(\mathcal {Y}\) by corresponding points in the feature space \(\mathcal{F}\). Here we are interested primarily in dimensionality reduction and use Local Tangent Space Alignment (LTSA) (Zhang and Zha 2004). The tangent space at a point \(\mathbf {y}\) provides a lowdimensional linear approximation of points in a neighbourhood of \(\mathbf {y}\). We can approximate each point \(\mathbf {y}\) in a data set using a basis for \(T_\mathbf {y}\mathcal {Y}\) and use these approximations to find lowdimensional representations in a global coordinate system, by aligning the tangent spaces using local affine transformations (Zhang and Zha 2004). We note that this assumes the existence of a single chart (homeomorphism) \(\mathbf {f}^{1}\).
Fixing the number of neighbours assumes that the manifold has a certain smoothness, while using the same number of neighbours for every tangent space assumes a global smoothness. These assumptions may result in inaccurate predictions, in which case we can use adaptive algorithms (Zou and Zhu 2011; Zhang et al. 2012; Wei et al. 2008). Similar adaptations can be made for other issues, such as robustness in the presence of noise (Zhan and Yin 2011).
We remark that LTSA is a nonparametric technique, in that an explicit form of \(\mathbf {f}\) is not available. This means that the outofsample problem does not have a parametric (explicit) solution. In other words, application of LTSA (the map \(\mathbf {f}^\)) to a point that was not in the data set can only be achieved by rerunning the entire algorithm with an updated data set that appends the new point. Nonparametric solutions to the outofsample problem have been developed, and one that is applicable to LTSA can be found in Li et al. (2005).
If we map points \(\mathbf {y}\in \mathcal{Y}\) to \(\mathcal{F}\) using \(\mathbf {f}^\) and perform inference in \(\mathcal{F}\), an approximation of \(\mathbf {f}\) is required in order to make predictions in the physical space \(\mathcal{Y}\). This is referred to as the preimage problem in manifold learning methods: given a point in the lowdimensional space, find a mapping to the original space (manifold). We outline an approximation of the preimage map in the next section.
3.1 Preimage Problem: Reconstruction of Points in the Manifold \(\mathcal {Y}\)
4 Gaussian Process Emulation in Feature Space
In Sect. 2.2, the surrogate model problem was defined as one of approximating the simulator mapping \(\varvec{\eta }:\mathcal {X}\rightarrow \mathcal {Y}\) given the data set \(\mathcal {D}' = \{\pmb {\Xi },\mathbf {Y}\}\) derived from runs of the simulator at selected design points \(\{\varvec{\xi }_n\}_{n=1}^N\). We can instead consider the simulator as a mapping \(\varvec{\eta }_\mathcal{F}\equiv \mathbf {f}^{1}\circ \varvec{\eta }:\mathcal{X}\rightarrow \mathcal{F}\) from the input space to the feature space, i.e. \(\varvec{\eta }_\mathcal{F}(\cdot )= \mathbf {f}^{1}(\varvec{\eta }(\cdot ))\). Application of LTSA to points on the manifold approximates this mapping with \(\mathbf {f}^\approx \mathbf {f}^{1}\). The original data set \(\mathcal {D}' = \{\pmb {\Xi },\mathbf {Y}\}\) is replaced by the equivalent data set \(\mathcal {D}=\{\pmb {\Xi },\mathbf {Z}\}\) or \(\mathcal{D}=\left\{ (\varvec{\xi }_n,\,\mathbf {z}_n)\right\} _{n=1}^N\), where \(\mathbf {z}_n=\mathbf {f}^(\mathbf {y}_n)\approx \mathbf {f}^{1}(\mathbf {y}_n)=\mathbf {f}^{1}(\varvec{\eta }(\varvec{\xi }_n))=\varvec{\eta }_\mathcal{F}(\varvec{\xi }_n)\), and our aim is now to approximate the mapping \(\varvec{\eta }_\mathcal{F}(\cdot )\). Returning a general point \(\mathbf {z}=\varvec{\eta }_\mathcal{F}(\varvec{\xi })\) to the corresponding point \(\mathbf {y}\) in the space \(\mathcal{Y}\) is discussed in the next section.
In this work, a GP model is used to infer the mapping \(\varvec{\eta }_\mathcal{F}:\varvec{\xi }\mapsto \mathbf {z}\) by treating it as a realization of a (Gaussian) stochastic process indexed by the inputs \(\varvec{\xi }\). Specifically, we learn each component of \(\mathbf {z}\) separately (assuming independence) using a scalar GP model. Here and throughout, \(\mathcal{GP}(\cdot ,\cdot )\) denotes a GP, in which the first argument is the mean function and the second is the covariance (kernel) function.
4.1 Sampling Hyperparameter Posterior with Hybrid Monte Carlo
5 Predictions
The physical models we consider have an unknown, stochastic input (e.g. the hydraulic conductivity). This represents a lack of knowledge of the input, which induces a random variable response (e.g. the pressure head). Quantifying the distribution over the response is referred to as a pushforward or forward problem. The pushforward measure is the distribution over the response, or quantity of interest derived from the response.^{2} Based on the methods of the preceding sections, we now present an emulation framework for interrogating the pushforward measure (the response distribution). We begin by describing in the next section how a single realization of the random variable response may be obtained given a single realization of the stochastic input. In Sect. 5.2, we then discuss how to quantify the pushforward measure (extract relevant statistics of the response).
5.1 Outputs Conditioned on Inputs
Due to the nature of the emulator, the prediction of a point \(\mathbf {z}\in \mathcal{F}\) is normally distributed. This distribution captures uncertainty in the predictions as a consequence of limited and noise corrupted data. A common challenge when using reduced dimensional representations is analytically propagating this distribution through a nonlinear, preimage map [in this case \(\hat{\mathbf {f}}:\mathcal{F}\ni \mathbf {z}\mapsto \mathbf {y}\in \mathcal {Y}\) defined by Eq. (23)] for a test input \(\varvec{\xi }\).
5.2 Marginalizing the Stochastic Input
6 Results and Discussion
We now assess the performance of the proposed method on two example partial differential equation problems: a Darcy flow problem with a contaminant mass balance, modelling steadystate groundwater flow in a 2d porous medium; and Richards equation, modelling singlephase flow through a 3d porous medium. As explained in Sect. 5, the analysis includes: (i) predictions that are conditioned on an input; and (ii) predictions that are marginalized over the stochastic input.
When making conditioned predictions, we use the conditional predictive distribution (30) for \(\mathbf {y}\), or the distribution (27) for \(\mathbf {z}\) in conjunction with the preimage map (23). As explained in 4.1, we place a prior over the hyperparameters \(\varvec{\Theta }\) and signal variances \(\pmb {\beta }\) and use a HMC scheme to sample from the posterior distributions over these parameters. Each sample can be used to obtain a different normal predictive distribution, conditioned on an input. We are therefore able to see how the predictive mean and variance change with respect to the uncertainty in the GP parameters. In the results, we plot the expectation and standard deviation of first two predictive distribution moments.
For the forward UQ problem we marginalize the conditional predictive distributions over a stochastic input (Eq. 32) to obtain the pushforward measure (non analytically). We are able to analytically find the mean using (A2) and (A3) together with the preimage map, or, using Algorithm 1, sample from the marginalized distribution via MC (Eq. 36).
The accuracies of both the point predictions and the predictions of the pushforward measure are assessed by comparison with the true values obtained with the simulator (on the test inputs \(\{\varvec{\xi }_q^*\}_{q=1}^Q\)). We run the solver for each test input to generate the true response, denoted \(\tilde{\mathbf {y}}^{*}_q\). For the UQ comparison we again approximate the pdf using KDE (or simply extract the moments) based on \(\{\tilde{\mathbf {y}}^{*}_q\}_{q=1}^Q\). The latter approximation is guaranteed to converge to the truth as the number of test inputs increases.
6.1 Darcy Flow: Nonpoint Source Pollution
The contaminant balance and flow (Darcy) equations are decoupled. The latter is solved using the finite element method based on triangular elements and firstorder (linear) shape functions. The boundary conditions are given by: (i) a constant head equal to 30 m on the left boundary; (ii) a general head boundary equal to 40 m with conductance equal 160 m\(^3\) day\(^{1}\) on the right boundary; and (iii) no flow on the top and bottom boundaries. Each land use polygon is assigned its own recharge rate. Stream rates are assigned directly to nodes. (Any node closer than 10 m to the stream is considered to be part of the stream.)
 M1

We set \(m_Z=\ln (40)\) and \(\sigma _Z^2 = 0.2\), yielding^{4} a mean for \(k(\mathbf {x})\) of 44.2 m day\(^{1}\), which is close to the default value in the mSim package, and a standard deviation of 13.63 m day\(^{1}\). The correlation lengths were chosen as \(l_1=2000\) m and \(l_2=1000\) m, which correspond to dimensionless values of 1 / 3 and 2 / 7, respectively. These choices require \({k_{\xi }} =5\) input dimensions to capture \(98\%\) of the generalized variance.
 M2

We set \(m_Z=\ln (36.18)\) and \(\sigma _Z^2 = 0.4\), again yielding a mean 44.2 m day\(^{1}\) and a standard deviation of 18.80 m day\(^{1}\). We set \(l_1=2000\) m and \(l_2=1000\) m. \({k_{\xi }} =5\) captures \(98\%\) of the generalized variance.
 M3

We set \(m_Z=\ln (40)\), and \(\sigma _Z^2 = 0.4\) and reduce the correlation lengths to \(l_1=1000\) m and \(l_2=500\) m (dimensionless values of 1 / 6 and 1 / 7, respectively). We now require \({k_{\xi }} =15\) to capture \(98\%\) of the generalized variance.
For model M1, the distributions of \(\{e_q\}_{n=1}^{Q}\) for different training set sizes N are shown as boxplots for increasing values of P in Fig. 1. The performance of the emulator is good even for \(N=25\) training points (maximum \(e_q\) of approximately \(e^{3}\)), although there is a clear decrease in the error when N is increased to 100. The relationship between the errors and P is more complicated. The errors are high for \(P<8\) (not shown in the boxplots) at all values of N and decrease as P increases. This is due to the linear approximation of points in local tangent spaces via PCA in the LTSA algorithm. As more points are added, the approximation improves. As P is increased beyond a certain value, however, the errors increase (this is most clearly visible for \(N=100\)). The reason for this behaviour is that for large enough neighbourhood sizes the linear approximation breaks down. Thus, there is an optimal choice of P for each value of N and the higher the value of N the more sensitive are the errors to the value of P. In the subsequent results we use \(P=15\) unless otherwise specified.
The distributions are accurately estimated for all values of N. While the predictions improve as the number of training samples N increases, the true value does not always lie within the contours. This is because: (i) as stated earlier, an increased GP predictive variance acts to smooth the density, rather than increase the width of the contours; (ii) by choosing a priori the number of neighbours we also a priori assume a global smoothness of the emulator; and (iii) we have a preimage map \(\widehat{\mathbf {f}} : \mathcal {F}\rightarrow \mathcal Y\) for which the error is unknown (as with all methods), but not estimated (as with probabilistic methods).
6.2 Richards Equation: Unsaturated Flow in Porous Media
Consider a singlephase flow through a 3d porous region \(\mathcal{R}\subset \mathbb {R}^3\) containing unsaturated soil with a random permeability field. The vertical flow problem can be solved using Richards equation (Darcy’s law combined with a mass balance). There are three standard forms of Richards equation: the pressure head based (hbased) form; the water contentbased (\(\theta \)based) form; and the mixedbased form. For flow in saturated or layered soils, the hbased form is particularly appropriate (Huang et al. 1996; Shahraiyni and AtaieAshtiani 2011).
The boundary conditions are those used in Haverkamp et al. (1977), corresponding to laboratory experiments of infiltration in a plexiglass column packed with sand. Along the top boundary (surface) \(x_3=20\) cm, the pressure head is maintained at \(h=20.7\) cm (\(\theta =0.267\) cm\(^3\) cm\(^{3}\)), and along the bottom boundary \(x_3=0\) cm, it is maintained at \(h=\,61.5\) cm. At all other boundaries a noflow condition is imposed: \(\nabla h \cdot \mathbf{n}=0\), where \(\mathbf{n}\) is the unit, outwardly pointing normal to the surface. The initial condition is \(h(\mathbf {x},0)=\,61.5\) cm.
The training and test input samples were drawn independently: \(\varvec{\xi }_n \sim \mathcal {N}\left( \mathbf{0},\mathbf{I}\right) \) and \(\varvec{\xi }_q \sim \mathcal {N}\left( \mathbf{0},\mathbf{I}\right) \) to yield \(\{\mathbf {y}_n\}_{n=1}^N\) for training and \(\{\widetilde{\mathbf {y}}^{*}_q\}_{q=1}^Q\) for testing and UQ. We set \(Q=5000\) and \(N\le 800\). As before, the manifold dimension was set to \({k_{z}} ={k_{\xi }} \). The number of neighbours P and the number of training points N were chosen as in the first example by examining the errors \(e_q=\widetilde{\mathbf {y}}^{*}_q\overline{\mathbf {y}}_q^*/\widetilde{\mathbf {y}}^{*}_q\) on the test set, where again \(\widetilde{\mathbf {y}}^{*}_q\) is the solver output (truth) and \(\overline{\mathbf {y}}_q^*\) is emulator prediction based on the GP predictive mean (26).
Equation (40) was solved using a finite difference scheme with firstorder differencing for the firstorder derivatives, central differencing for the secondorder derivatives and a fully implicit backward Euler time stepping scheme. A picard iteration scheme is used (Celia et al. 1990) at each time step. Details are provided in “Appendix C”.
7 Numerical Computation
LTSA naturally lends itself to parallelization since almost all computations are performed on each neighbourhood independently. After merging threads we need only solve an eigenvalue problem for an \(N\times N\) matrix. Similarly, independent Gaussian processes across latent dimensions leads to a natural parallelization framework.
For large sample sizes and feature space dimensions saving each \(Q_i\) can become infeasible (\(N \times {k_{y}} \times {k_{z}} \) elements). Similarly, for large sample and neighbourhood sizes saving f can become infeasible (\(N\times k^2\) elements). In such cases, these tensors may be saved to file or recalculated online.
The scalability of our approach is limited by the computational complexity of Gaussian processes \(\mathcal {O}\left( N^3\right) \). However, this can be alleviated by using sparse Gaussian process regression models. These models introduce m inducing points, reducing computational complexity to \(\mathcal {O}\left( m^2 N\right) \). We may also use active learning to reduce the number of samples required.
8 Summary and Conclusions
In this paper we developed a new approach to the emulation of a model involving a random field input and a field output, with a focus on problems arising in groundwater flow modelling. The main challenges are the high input and output space dimensionalities, which we dealt with using a KL expansion and manifold learning, respectively. We implemented LTSA on the given outputs (training data), which allowed us to perform Bayesian inference in a lowdimensional feature space. Furthermore, we developed a framework for UQ in such problems by marginalizing over the inputs, either analytically (the mean and possibly in some cases the standard deviation) or using MC sampling.
Testing the emulation method on two examples reveals that it performs well in certain cases. When the variance of the lognormal input is high or the correlation lengths of the normal process \(Z(\mathbf {x})\) are short, the accuracy suffers, as is found in all other approaches. Nevertheless, the accuracy in terms of the forward UQ problem is high even in such cases for the examples considered. (Of course, further increases in the variance and correlation lengths would eventually lead to unacceptably poor performance.)
The major drawback of the KL expansion approach (and similarly with circulant embedding) is the curse of dimensionality as the number of retained coefficients grows. Some progress can be made in this regard by using a Smolyak algorithm (Smolyak 1963) for sampling or incremental local tangent space alignment (Liu et al. 2006) combined with active learning (Settles 2012), but the gains will be limited. Our method, in common with other methods except direct Monte Carlo or ROMs, is therefore potentially limited, given current computational resources, to problems in which the domain size is at most a few multiples of the shortest correlation length. The assumption of independence of the feature vector coordinates is also suboptimal. Since the number of coordinates is small, however, this assumption can easily be relaxed by adopting, e.g. a convolved GP approach.
9 Appendix A: Moments of the Marginal Distribution Over \(\mathbf {z}\)
10 Appendix B: Kernel Expectation
11 Appendix C: Numerical Algorithm for Richards Equation
Footnotes
 1.
Technically the process is a random field if the index (here \(\mathbf {x}\)) lies in \(\mathbb {R}^{L}\) where \(L>1\) but the convention in the great majority of the literature is to use the term Gaussian process even in such cases.
 2.
Let \(\mathbb {P}_\mathcal{X}\) be a measure on \((\mathcal{X},\mathcal{F}_\mathcal{X})\). The pushforward measure of \(\mathbb {P}_\mathcal{X}\) under \(\pmb {\eta }:(\mathcal{X},\mathcal{F}_\mathcal{X},\mathbb {P}_\mathcal{X})\rightarrow (\mathcal{Y},\mathcal{F}_\mathcal{Y},\mathbb {P}_\mathcal{Y})\) is defined as \(\mathbb {P}_\mathcal{Y}(F)=\mathbb {P}_\mathcal{X}\circ \pmb {\eta }^{1}(F)\) for \(F\in \mathcal{F}_\mathcal{Y}\). We characterize the measures by their probability density functions (pdfs) with respect to Lebesgue measure. In this work a Gaussian distribution is placed on the inputs.
 3.
See http://subsurface.gr/joomla/msim_doc/twoD_examples_help.html for full details of the implementation, including the domain, mesh generation and boundary conditions. Last accessed 29 August 2017.
 4.
If \(Z(\mathbf {x})\) has a mean and variance of \(\mu \) and \(\nu \), then the mean and variance of the lognormal process \(\exp (Z(\mathbf {x}))\) are \(\mu '=\exp (\mu +\nu /2)\) and \(\nu '=\exp (2\mu +\nu )(\exp (\nu )1)\), respectively.
Notes
Acknowledgements
CG would like to acknowledge the Warwick Centre for Predictive Modelling for a Ph.D. scholarship. AS would like to acknowledge the EPSRC, UK, for financial support (Grant No. EP/P012620/1).
References
 AlTabbaa, A., Ayotamuno, J., Martin, R.: Onedimensional solute transport in stratified sands at short travel distances. J. Hazard. Mater. 73(1), 1–15 (2000)Google Scholar
 Aly, A.H., Peralta, R.C.: Optimal design of aquifer cleanup systems under uncertainty using a neural network and a genetic algorithm. Water Resour. Res. 35(8), 2523–2532 (1999)Google Scholar
 AtaieAshtiani, B., Ketabchi, H., Rajabi, M.M.: Optimal management of a freshwater lens in a small island using surrogate models and evolutionary algorithms. J. Hydrol. Eng. 19(2), 339–354 (2014)Google Scholar
 Babuška, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial differential equations with random input data. SIAM J. Numer. Anal. 45(3), 1005–1034 (2007)Google Scholar
 Barry, D., Prommer, H., Miller, C., Engesgaard, P., Brun, A., Zheng, C.: Modelling the fate of oxidisable organic contaminants in groundwater. Adv. Water Resour. 25(8), 945–983 (2002)Google Scholar
 Bau, D.A., Mayer, A.S.: Stochastic management of pumpandtreat strategies using surrogate functions. Adv. Water Resour. 29(12), 1901–1917 (2006)Google Scholar
 Betz, W., Papaioannou, I., Straub, D.: Numerical methods for the discretization of random fields by means of the Karhunen–Loève expansion. Comput. Methods Appl. Mech. Eng. 271, 109–129 (2014)Google Scholar
 Bhattacharjya, R.K., Datta, B.: Optimal management of coastal aquifers using linked simulation optimization approach. Water Resour. Manag. 19(3), 295–320 (2005)Google Scholar
 Boi, D., Stankovi, V., Gorgievski, M., Bogdanovi, G., Kovaevi, R.: Adsorption of heavy metal ions by sawdust of deciduous trees. J. Hazard. Mater. 171(1), 684–692 (2009)Google Scholar
 Borgonovo, E., Castaings, W., Tarantola, S.: Model emulation and momentindependent sensitivity analysis: an application to environmental modelling. Environ. Model. Softw. 34, 105–115 (2012)Google Scholar
 Celia, M.A., Ahuja, L.R., Pinder, G.F.: Orthogonal collocation and alternatingdirection procedures for unsaturated flow problems. Adv. Water Resour. 10(4), 178–187 (1987)Google Scholar
 Celia, M.A., Bouloutas, E.T., Zarba, R.L.: A general massconservative numerical solution for the unsaturated flow equation. Water Resour. Res. 26(7), 1483–1496 (1990)Google Scholar
 Conti, S., O’Hagan, A.: Bayesian emulation of complex multioutput and dynamic computer models. J. Stat. Plan. Inference 140(3), 640–651 (2010)Google Scholar
 Currin, C., Mitchell, T., Morris, M., Ylvisaker, D.: A Bayesian approach to the design and analysis of computer experiments, Tech. rep., ORNL6498, Oak Ridge National Laboratory (1988)Google Scholar
 Feyen, J., Jacques, D., Timmerman, A., Vanderborght, J.: Modelling water flow and solute transport in heterogeneous soils: a review of recent approaches. J. Agric. Eng. Res. 70(3), 231–256 (1998)Google Scholar
 Foo, K., Hameed, B.: An overview of landfill leachate treatment via activated carbon adsorption process. J. Hazard. Mater. 171(1), 54–60 (2009)Google Scholar
 Fu, J., GomezHernandez, J.J.: Uncertainty assessment and data worth in groundwater flow and mass transport modeling using a blocking Markov chain Monte Carlo method. J. Hydrol. 364(3), 328–341 (2009)Google Scholar
 Gelhar, L.W.: Stochastic subsurface hydrology from theory to applications. Water Resour. Res. 22(9S), 135S–145S (1986)Google Scholar
 Gelhar, L.W., Axness, C.L.: Threedimensional stochastic analysis of macrodispersion in aquifers. Water Resour. Res. 19(1), 161–180 (1983)Google Scholar
 Ghanem, R.G., Spanos, P.D.: Stochastic Finite Elements: A Spectral Approach. Springer, New York (2003)Google Scholar
 Girard, A., MurraySmith, R.: Gaussian processes: prediction at a noisy input and application to iterative multiplestep ahead forecasting of timeseries, pp. 546–551. Lecture Notes in Computer Science, Springer, Switching and Learning in Feedback Systems (2003)Google Scholar
 Haverkamp, R., Vauclin, M., Touma, J., Wierenga, P., Vachaud, G.: A comparison of numerical simulation models for onedimensional infiltration. Soil Sci. Soc. Am. J. 41(2), 285–294 (1977)Google Scholar
 Hemker, T., Fowler, K.R., Farthing, M.W., von Stryk, O.: A mixedinteger simulationbased optimization approach with surrogate functions in water resources management. Optim. Eng. 9(4), 341–360 (2008)Google Scholar
 Herckenrath, D., Langevin, C.D., Doherty, J.: Predictive uncertainty analysis of a saltwater intrusion model using nullspace Monte Carlo. Water Resour. Res. 47(5), W05504 (2011)Google Scholar
 Higdon, D., Gattiker, J., Williams, B., Rightley, M.: Computer model calibration using highdimensional output. J. Am. Stat. Assoc. 103(482), 570–583 (2008)Google Scholar
 Huang, K., Mohanty, B., Van Genuchten, M.T.: A new convergence criterion for the modified Picard iteration method to solve the variably saturated flow equation. J. Hydrol. 178(1–4), 69–91 (1996)Google Scholar
 Karatzas, G.P.: Developments on modeling of groundwater flow and contaminant transport. Water Resour. Manag. 31(10), 3235–3244 (2017)Google Scholar
 Ketabchi, H., AtaieAshtiani, B.: Review: coastal groundwater optimization—advances, challenges, and practical solutions. Hydrogeol. J. 23(6), 1129–1154 (2015)Google Scholar
 Kourakos, G., Harter, T.: Parallel simulation of groundwater nonpoint source pollution using algebraic multigrid preconditioners. Comput. Geosci. 18(5), 851–867 (2014)Google Scholar
 Kourakos, G., Mantoglou, A.: Pumping optimization of coastal aquifers based on evolutionary algorithms and surrogate modular neural network models. Adv. Water Resour. 32(4), 507–521 (2009)Google Scholar
 Kourakos, G., Klein, F., Cortis, A., Harter, T.: A groundwater nonpoint source pollution modeling framework to evaluate longterm dynamics of pollutant exceedance probabilities in wells and other discharge locations. Water Resour. Res. 48(6), W00L13 (2012)Google Scholar
 Kristensen, A.H., Poulsen, T.G., Mortensen, L., Moldrup, P.: Variability of soil potential for biodegradation of petroleum hydrocarbons in a heterogeneous subsurface. J. Hazard. Mater. 179(1), 573–580 (2010)Google Scholar
 Li, H., Teng, L., Chen, W., Shen, I.F.: Supervised learning on local tangent space. In: Advances in Neural NetworksISNN, Lecture Notes in Computer Science, vol. 2005, pp. 546–551. Springer (2005)Google Scholar
 Liu, X., Yin, J., Feng, Z., Dong, J.: Incremental manifold learning via tangent space alignment. In: Schwenker, F., Marinai, S. (eds.) Artificial Neural Networks in Pattern Recognition, pp. 107–121. Springer, Berlin (2006)Google Scholar
 Ma, X., Zabaras, N.: An adaptive hierarchical sparse grid collocation algorithm for the solution of stochastic differential equations. J. Comput. Phys. 228(8), 3084–3113 (2009)Google Scholar
 Ma, X., Zabaras, N.: Kernel principal component analysis for stochastic input model generation. J. Comput. Phys. 230(19), 7311–7331 (2011)Google Scholar
 Maxwell, R.M., Welty, C., Harvey, R.W.: Revisiting the cape cod bacteria injection experiment using a stochastic modeling approach. Environ. Sci. Technol. 41(15), 5548–5558 (2007)Google Scholar
 Nobile, F., Tempone, R., Webster, C.G.: A sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numer. Anal. 46(5), 2309–2345 (2008)Google Scholar
 O’Hagan, A., Kingman, J.F.C.: Curve fitting and optimal design for prediction. J. R. Stat. Soc. Ser. B (Methodological) 40, 1–42 (1978)Google Scholar
 Paleologos, E.K., Avanidou, T., Mylopoulos, N.: Stochastic analysis and prioritization of the influence of parameter uncertainty on the predicted pressure profile in heterogeneous, unsaturated soils. J. Hazard. Mater. 136(1), 137–143 (2006)Google Scholar
 Rajabi, M.M., AtaieAshtiani, B., Simmons, C.T.: Polynomial chaos expansions for uncertainty propagation and moment independent sensitivity analysis of seawater intrusion simulations. J. Hydrol. 520, 101–122 (2015)Google Scholar
 Rathfelder, K., Abriola, L.M.: Mass conservative numerical solutions of the headbased Richards equation. Water Resour. Res. 30(9), 2579–2586 (1994)Google Scholar
 Ray, R., Mohanty, B.: Some Numerical Investigations of the Richards’ Equation, ASAE Paper 922586. American Society of Agricultural Engineers, St Joseph (1992)Google Scholar
 Razavi, S., Tolson, B.A., Burn, D.H.: Review of surrogate modeling in water resources. Water Resour. Res. 48(7), W07401 (2012)Google Scholar
 Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Stat. Sci. 4(4), 409–423 (1989)Google Scholar
 Schfer, D., Schfer, W., Kinzelbach, W.: Simulation of reactive processes related to biodegradation in aquifers. J. Contam. Hydrol. 31(1), 167–186 (1998)Google Scholar
 Settles, B.: Active learning. Synth. Lect. Artifi. Intell. Mach. Learn. 6(1), 1–114 (2012)Google Scholar
 Shahraiyni, H.T., AtaieAshtiani, B.: Mathematical forms and numerical schemes for the solution of unsaturated flow equations. J. Irrig. Drain. Eng. 138(1), 63–72 (2011)Google Scholar
 Simonoff, J.S.: Smoothing Methods in Statistics. Springer, New York (1996)Google Scholar
 Smolyak, S.A.: Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl. Akad. Nauk SSSR 4, 240–243 (1963)Google Scholar
 Sreekanth, J., Datta, B.: Coupled simulationoptimization model for coastal aquifer management using genetic programmingbased ensemble surrogate models and multiplerealization optimization. Water Resour. Res. 47(4), w04516 (2011)Google Scholar
 Sreekanth, J., Datta, B.: Comparative evaluation of genetic programming and neural network as potential surrogate models for coastal aquifer management. Water Resour. Manag. 25(13), 3201–3218 (2011)Google Scholar
 Sreekanth, J., Datta, B.: Stochastic and robust multiobjective optimal management of pumping from coastal aquifers under parameter uncertainty. Water Resour. Manag. 28(7), 2005–2019 (2014)Google Scholar
 Vomvoris, E.G., Gelhar, L.W.: Stochastic analysis of the concentration variability in a threedimensional heterogeneous aquifer. Water Resour. Res. 26(10), 2591–2602 (1990)Google Scholar
 Wan, X., Karniadakis, G.E.: A sharp error estimate for the fast Gauss transform. J. Comput. Phys. 219(1), 7–12 (2006)Google Scholar
 Wei, J., Peng, H., Lin, Y.S., Huang, Z.M., Wang, J.B.: Adaptive neighborhood selection for manifold learning. In: Machine Learning and Cybernetics, 2008 International Conference on, vol. 1, pp. 380–384. IEEE (2008)Google Scholar
 Wong, E.: Stochastic Processes in Information and Dynamical Systems. McGrawHill, New York (1971)Google Scholar
 Xing, W., Shah, A.A., Nair, P.B.: Reduced dimensional Gaussian process emulators of parametrized partial differential equations based on isomap. Proc. R. Soc. Lond. A 471(2174), 20140697 (2015)Google Scholar
 Xing, W., Triantafyllidis, V., Shah, A., Nair, P., Zabaras, N.: Manifold learning for the emulation of spatial fields from computational models. J. Comput. Phys. 326, 666–690 (2016)Google Scholar
 Xiu, D.: Efficient collocational approach for parametric uncertainty analysis. Commun. Comput. Phys. 2(2), 293–309 (2007)Google Scholar
 Xiu, D., Hesthaven, J.S.: Highorder collocation methods for differential equations with random inputs. SIAM J. Sci. Comput. 27(3), 1118–1139 (2005)Google Scholar
 Xiu, D., Karniadakis, G.E.: The Wiener–Askey polynomial chaos for stochastic differential equations. SIAM J. Sci. Comput. 24(2), 619–644 (2002)Google Scholar
 Zarba, R.L., Bouloutas, E., Celia, M.: General massconservative numerical solution for the unsaturated flow equation. Water Resour. Res. WRERAQ 26(7), 1483–1496 (1990)Google Scholar
 Zhan, Y., Yin, J.: Robust local tangent space alignment via iterative weighted PCA. Neurocomputing 74(11), 1985–1993 (2011)Google Scholar
 Zhang, D., Lu, Z.: An efficient, highorder perturbation approach for flow in random porous media via Karhunen–Loève and polynomial expansions. J. Comput. Phys. 194(2), 773–794 (2004)Google Scholar
 Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004)Google Scholar
 Zhang, Z., Wang, J., Zha, H.: Adaptive manifold learning. IEEE Trans. Pattern Anal. Mach. Intell. 34(2), 253–265 (2012)Google Scholar
 Zou, X., Zhu, Q.: Adaptive neighborhood graph for LTSA learning algorithm without freeparameter. Int. J. Comput. Appl. 19(4), 28–33 (2011)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.