# Bayesian Uncertainty Propagation Using Gaussian Processes

- 1 Citations
- 1k Downloads

## Abstract

Classic non-intrusive uncertainty propagation techniques, typically, require a significant number of model evaluations in order to yield convergent statistics. In practice, however, the computational complexity of the underlying computer codes limits significantly the number of observations that one can actually make. In such situations the estimates produced by classic approaches cannot be trusted since the limited number of observations induces additional epistemic uncertainty. The goal of this chapter is to highlight how the Bayesian formalism can quantify this epistemic uncertainty and provide robust predictive intervals for the statistics of interest with as few simulations as one has available. It is shown how the Bayesian formalism can be materialized by employing the concept of a Gaussian process (GP). In addition, several practical aspects that depend on the nature of the underlying response surface, such as the treatment of spatiotemporal variation, and multi-output responses are discussed. The practicality of the approach is demonstrated by propagating uncertainty through a dynamical system and an elliptic partial differential equation.

## Keywords

Epistemic uncertainty Expensive computer code Expensive computer simulations Gaussian process Uncertainty propagation## 1 Introduction

National laboratories, research groups, and corporate R&D departments have spent decades in time and billions in dollars to develop realistic multi-scale/multi-physics computer codes for a wide range of engineered systems such as aircraft engines, nuclear reactors, and automobile vehicles. The driving force behind the development of these models has been their potential use for designing systems with desirable properties. However, this potential has been hindered by the inherent presence of uncertainty attributed to the lack of knowledge about initial/boundary conditions, material properties, geometric features, the form of the models, as well as the noise present in the experimental data used in model calibration.

This chapter focuses on the task of propagating parametric uncertainty through a given computer code ignoring the uncertainty introduced by discretizing an ideal mathematical model. This task is known as the *uncertainty propagation* (UP) problem. Even though the discussion is limited to the UP problem, all the ideas presented can be extended to the problems of model calibration and design optimization under uncertainty, albeit this is beyond the scope of this chapter.

The simplest approach to the solution of the UP problem is the Monte Carlo (MC) approach. When using MC, one simply samples the parameters, evaluates the model, and records the response. By post-processing the recorded responses, it is possible to quantify any statistic of interest. However, obtaining convergent statistics via MC for computationally expensive realistic models is not feasible, since one may be able to run only a handful of simulations.

The most common way of dealing with expensive models is to replace them with inexpensive surrogates. That is, one evaluates the model on a set of design points and then tries to develop an accurate representation of the response surface based on what he observes. The most popular such approach is to expand the response in a generalized polynomial chaos basis (gPC) [4, 51, 52, 53] and approximate its coefficients with a Galerkin projection computed with a quadrature rule, e.g., sparse grids [44]. In relatively low-dimensional parametric settings, these techniques outperform MC by orders of magnitude. In addition, it is possible to prove rigorous convergence results for gPC. However, the quality of the estimates when using a very limited number of simulations is questionable. The main reason is that they do not attempt to quantify the epistemic uncertainty induced by this very fact.

The quantification of the epistemic uncertainty induced by the limited number of simulations requires statistical methodologies and, specifically, a Bayesian approach. The first statistical approach to computer code surrogate building was put forward by Currin et al. [16] and Sacks et al. [42], both using Gaussian processes (GP). This work was put in a Bayesian framework by Currin et al. [17] and Welch et al. [50]. The first fully Bayesian treatment of the UP problem has its roots in the Bayes-Hermite quadrature of O’Hagan [36], and it was put on the modern context in O’Hagan et al. [38] and Oakley and O’Hagan [34]. Of great relevance is the Gaussian emulation machine for sensitivity analysis (GEM-SAM) software of O’Hagan and Kennedy [37]. The work of the authors in [7, 10] constitutes a continuation along this path. The present chapter is a comprehensive review of the Bayesian approach to UP as of now.

The outline of the chapter is as follows. It starts with a generic description of physical models and the computer emulators used for their evaluation, followed by the definition of the UP problem. Then, it discusses the Bayesian approach to UP introducing the concept of a Bayesian surrogate and showing how the epistemic uncertainty induced by limited observations can be represented. Then, it shows how the Bayesian framework can be materialized using GPs, by providing practical guidelines for the treatment of spatiotemporal multi-output responses, training the hyper-parameters of the model, and quantifying epistemic uncertainty due to limited data by sampling candidate surrogates. The chapter ends with three demonstrative examples, a synthetic one-dimensional example that clarifies some of the introduced concepts, a dynamical system with uncertain initial conditions, and a stochastic partial differential equation.

## 2 Methodology

### 2.1 Physical Models

*physical model*is mathematically equivalent to a multi-output function,

*d*

_{ s }= 0, 1, 2, or 3, is the

*spatial*domain; \(\mathcal{X}_{t} \subset \mathbb{R}^{d_{t}}\), with

*d*

_{ t }= 0, or 1, is the

*time*domain; \(\mathcal{X}_{\xi }\subset \mathbb{R}^{d_{\xi }}\) is the

*parameter*domain; and \(\mathcal{Y}\subset \mathbb{R}^{d_{y}}\) is the

*output*space. Note that under this notation,

*d*

_{ s }= 0, or

*d*

_{ t }= 1, is interpreted as if

**f**(⋅ ) has no spatial or time components, respectively. One thinks of \(\mathbf{f}(\mathbf{x}_{s},t,\boldsymbol{\xi })\) as the model response at the spatial location \(\mathbf{x}_{s} \in \mathcal{X}_{s}\) at time \(t \in \mathcal{X}_{t}\) when the parameters \(\boldsymbol{\xi }\in \mathcal{X}_{\xi }\) are used. The parameters, \(\boldsymbol{\xi }\), should specify everything that is required in order to provide a complete description of the system. This covers any physical parameters, boundary conditions, external forcings, etc .

#### 2.1.1 Example: Dynamical Systems

*d*

_{ y }=

*q*.

#### 2.1.2 Example: Partial Differential Equation

*d*

_{ y }= 4, with

*i*-th component of the velocity field of the fluid, respectively .

### 2.2 Computer Emulators

*computer emulator*,

**f**

_{ c }(⋅ ), of a physical model,

**f**(⋅ ), is a function that reports the physical model at a given set of spatial locations,

**X**

_{ t }may be the time steps on which the numerical integration routine reports the solution. In the porous flow example

**X**

_{ s }may be the centers of the cells of a finite volume scheme and

**X**

_{ t }the time steps on which the numerical integration reports the solution.

*ψ*

_{ i }(

**x**

_{ s }) are known spatial basis functions. Then, one may think of the computer code as the function that returns the coefficients \(\mathbf{c}_{i}(t,\boldsymbol{\xi }) \in \mathbb{R}^{d_{y}}\), i.e.,

### 2.3 The Uncertainty Propagation Problem

*stochastic*input. The goal of uncertainty propagation is to study the effects of \(p(\boldsymbol{\xi })\) on the model output \(\mathbf{f}(\mathbf{x}_{s},t,\boldsymbol{\xi })\). Usually, it is sufficient to be able to compute low-order statistics such as the

*mean*,

*covariance*matrix function:

*variance*of component

*i*as a function of space and time:

The focus of this chapter is restricted to *non-intrusive* uncertainty propagation methods. These techniques estimate the statistics of the physical model using the computer code **f**_{ c }(⋅ ) as a black box. In particular, a fully Bayesian approach using Gaussian processes is developed .

### 2.4 The Bayesian Approach to Uncertainty Propagation

*n*simulations and collected the following data set:

**y**

_{ i }=

**f**(

**x**

_{ i }). The problem is to estimate the statistics of the response,

**y**, using only the simulations in \(\mathcal{D}\).

Classic approaches to uncertainty propagation use \(\mathcal{D}\) to build a surrogate surface \(\hat{\mathbf{f}}(\mathbf{x})\) of the original model **f**(**x**). Then, they characterize the uncertainty on the response **y** by propagating the uncertainty of the stochastic inputs, \(\boldsymbol{\xi }\), through the surrogate. In some cases, e.g., gPC [26], this can be done analytically. In general, since the surrogate surface is cheap to evaluate, the uncertainty of the stochastic inputs, \(\boldsymbol{\xi }\), is propagated through via a simple Monte Carlo procedure [31, 41]. Such a procedure can provide point estimates of any statistic. However, what can one say about the accuracy of these estimates? This question becomes important especially when the number of simulations, *n*, is very small. The Bayesian approach to uncertainty propagation can address this issue, by providing confidence intervals for the estimated statistics .

#### 2.4.1 Bayesian Surrogates

*Bayesian surrogate*. A Bayesian surrogate is a probability measure on the space of surrogates which is compatible with one’s prior beliefs about the nature of

**f**(

**x**) as well as the data \(\mathcal{D}\). A precise mathematical meaning of these concepts is given in the Gaussian process section. For the moment – and without loss of generality – assume that one has a parameterized family of surrogates, \(\hat{\mathbf{f}}(\cdot ;\boldsymbol{\theta })\), where \(\boldsymbol{\theta }\) is a finite dimensional random variable, with PDF \(p(\boldsymbol{\theta })\). Intuitively, think of \(\hat{\mathbf{f}}(\cdot ;\boldsymbol{\theta })\) as a candidate surrogate with parameters \(\boldsymbol{\theta }\) and that, before observing any data, \(\boldsymbol{\theta }\) may take any value compatible with the

*prior*probability \(p(\boldsymbol{\theta })\). In addition, let \(p(\mathcal{D}\vert \boldsymbol{\theta })\) be the

*likelihood*of the simulations under the model. Using Bayes rule, one may characterize his state of knowledge about \(\boldsymbol{\theta }\) via the

*posterior*PDF:

*evidence*. The posterior of \(\boldsymbol{\theta }\), Equation (15), neatly encodes everything one has learned about the true response,

**f**(⋅ ), after seeing the simulations in \(\mathcal{D}\). How can one use this information to characterize his state of knowledge about the statistics of

**f**(⋅ )?

#### 2.4.2 Predictive Distribution of Statistics

*statistic*is an operator \(\mathcal{Q}[\cdot ]\) that acts on the response surface. Examples of statistics are the mean, \(\mathbb{E}_{\boldsymbol{\xi }}[\cdot ]\), of Equation (11); the covariance, \(\mathbb{C}_{\boldsymbol{\xi }}[\cdot ]\), of Equation (12); and the variance, \(\mathbb{V}_{\boldsymbol{\xi }}[\cdot ]\), of Equation (13). Using the posterior of \(\boldsymbol{\theta }\) (see Equation (15)), the state of knowledge about an arbitrary statistic \(\mathcal{Q}[\cdot ]\) is characterized via

*predictive distribution*of the statistic \(\mathcal{Q}[\cdot ]\) given \(\mathcal{D}\). The uncertainty in \(p(\mathcal{Q}\vert \mathcal{D})\) corresponds to the

*epistemic uncertainty*induced by one’s limited-data budget. The Bayesian approach is the only one that can naturally characterize this epistemic uncertainty.

- 1.
Sample a \(\boldsymbol{\theta }\) from \(p(\boldsymbol{\theta }\vert \mathcal{D})\) of Equation (15).

- 2.
Evaluate \(\mathcal{Q}[\hat{\mathbf{f}}(\cdot ;\boldsymbol{\theta })]\).

### 2.5 Gaussian Process Regression

For simplicity , the section starts by developing the theory for one-dimensional outputs, i.e., *n*_{ y } = 1.

#### 2.5.1 Modeling Prior Knowledge About the Response

An experienced scientist or engineer has some knowledge about the response function, *f*(⋅ ), even before running any simulations. For example, he might know that *f*(⋅ ) cannot exceed, or be smaller than, certain values, that it satisfies translation invariance, that it is periodic along certain inputs, etc. This knowledge is known as *prior knowledge *.

Prior knowledge can be *precise*, e.g., the response is exactly twice differentiable, the period of the first input is 2*π*, etc., or it can be *vague*, e.g., the probability that the period *T* takes any particular value is *p*(*T*), the probability that the length-scale \(\ell_{1}\) of the first input takes any particular value is \(p(\ell_{1})\), etc. When one is dealing with vague prior knowledge, he may refer to it as *prior belief*. Almost always, one’s prior knowledge about a computer code is a prior belief.

Prior knowledge about *f*(⋅ ) can be modeled by a probability measure on the space of functions from \(\mathcal{X}\) to \(\mathbb{R}\). This probability measure encodes one’s prior beliefs, in the sense that it assigns probability one to the set of functions that are consistent with it. A Gaussian process is a great way to represent this probability measure.

*f*(⋅ ) is a Gaussian process with

*mean function*\(m : \mathcal{X} \rightarrow \mathbb{R}\) and

*covariance function*\(k : \mathcal{X}\times \mathcal{X} \rightarrow \mathbb{R}\), i.e.,

**m**(

**X**) = (

*m*(

**x**

_{ i }))

_{ i }, \(\mathbf{k}(\mathbf{X},\mathbf{X}') = (k(\mathbf{x}_{i},\mathbf{x}_{j}'))_{ij}\), and \(\mathcal{N}_{n}(\cdot \vert \boldsymbol{\mu },\boldsymbol{\varSigma })\) are the probability density function of a

*n*-dimensional multivariate normal random variable with mean vector \(\boldsymbol{\mu }\) and covariance matrix \(\boldsymbol{\varSigma }\).

But how does Equation (18) encode one’s prior knowledge about the code? It does so through the choice of the mean and the covariance functions. The mean function can be an arbitrary function. Its role is to encode any generic trends about the response. The covariance function can be any semi-positive-definite function. It is used to model the signal strength of the response and how it varies across \(\mathcal{X}\), the similarity (correlation) of the response at two distinct input points, noise, regularity, periodicity, invariance, and more. The choice of mean and covariance functions is discussed more elaborately in what follows.

*hyper-parameters*of the model. Using this notation, the most general model considered in this work is:

##### Choosing the Mean Function

*generalized linear model*:

*weights*. A popular prior for the weights,

**b**, is the improper, “non-informative,” uniform:

**b**out of the model [15]. Some examples of generalized linear model s are:

- 1.The
*constant*mean function,*d*_{ h }= 1,$$\displaystyle{ h(x) = 1; }$$(31) - 2.The
*linear*mean function, \(d_{h} = d + 1\),$$\displaystyle{ \mathbf{h}(\mathbf{x}) = (1,x_{1},\ldots ,x_{d}); }$$(32) - 3.The
*generalized polynomial chaos*(gPC) mean function in which \(\mathbf{h}(\cdot ) = (h_{1}(\cdot ),\ldots ,h_{d_{h}}(\cdot ))\), with the \(h_{i} : \mathbb{R}^{d} \rightarrow \mathbb{R},i = 1,\ldots ,d_{h}\) being polynomials with degree up to*ρ*and orthogonal with respect to a measure*μ*(⋅ ), i.e.For uncertainty propagation applications,$$\displaystyle{ \int h_{i}(x)h_{j}(x)d\mu (x) =\delta _{ij}. }$$(33)*μ*(⋅ ) is usually a probability measure. For many well-known probability measures, the corresponding gPC is known [53]. For arbitrary probability measures, the polynomials can only be constructed numerically, e.g., [24]. Excellent Fortran code for the construction of orthogonal polynomials can be found in [25]. An easy-to-use Python interface is available by Bilionis [6]. - 4.The
*Fourier*mean function defined for*d*= 1 in which \(\mathbf{h}(\cdot ) = (h_{1}(\cdot ),\ldots ,h_{d_{h}}(\cdot ))\), with the \(h_{i} : \mathbb{R} \rightarrow \mathbb{R},i = 1,\ldots ,d_{h}\) being trigonometric functions supporting certain frequencies*ω*_{ i }, i.e.,$$\displaystyle{ h_{2i}(x) =\sin (\omega _{i}x),\;\text{and}\;h_{2i+1}(x) =\cos (\omega _{i}x). }$$(34)

##### Choosing the Covariance Function

- 1.Modeling measurement noise. Assume that measurements of
*f*(**x**) are noisy and that this noise is Gaussian with variance \(\sigma ^{2}\). GPs can account for this fact if one adds a Kronecker delta-like function to the covariance, i.e., if one works with a covariance of the form:Note that \(\delta (\mathbf{x} -\boldsymbol{\xi }')\) here is one if and only if$$\displaystyle{ k(\mathbf{x},\mathbf{x}';\boldsymbol{\psi }_{k}) = k_{0}(\mathbf{x},\mathbf{x}';\boldsymbol{\psi }_{k_{0}}) +\sigma ^{2}\delta (\mathbf{x} -\mathbf{x}'). }$$(35)**x**and**x**′ correspond to exactly the same measurement. Even though most computer simulators do not return noisy outputs, some do, e.g., molecular dynamics simulations. So, being able to model noise is useful. Apart from that, it turns out that including a small \(\sigma ^{2}\), even when there is no noise, is beneficial because it can improve the stability of the computations. When \(\sigma ^{2}\) is included for this reason, it is known as a “nugget” or as a “jitter.” In addition to the improved stability, the nugget can also lead to better predictive accuracy [27]. - 2.
Modeling regularity. It can be shown that the regularity properties of the covariance function are directly associated to the regularity of the functions sampled from the probability measure induced by the GP (Equations (18) and (23)). For example, if \(k(\mathbf{x},\mathbf{x};\boldsymbol{\psi }_{k})\) is continuous at

**x**, then samples \(f(\cdot )\) from Equation (23) are continuous almost surely (a.s.) at**x**. If \(k(\mathbf{x},\mathbf{x};\boldsymbol{\psi })\) is*ρ*times differentiable at**x**, then samples \(f(\cdot )\) from Equation (23) are*ρ*times differentiable a.s. at**x**. - 3.
Modeling invariance. If \(k(\cdot ,\cdot ;\boldsymbol{\psi }_{k})\) is invariant with respect to a transformation

*T*, i.e., \(k(\mathbf{x},T\mathbf{x}';\boldsymbol{\psi }_{k}) = k(T\mathbf{x},\mathbf{x}';\boldsymbol{\psi }_{k}) = k(\mathbf{x},\mathbf{x}';\boldsymbol{\psi }_{k})\), then samples*f*(**x**) from Equation (23) are invariant with respect to the same transformation a.s. In particular, if \(k(\cdot ,\cdot ;\boldsymbol{\psi })\) is periodic, then samples*f*(**x**) from Equation (23) are periodic a.s. - 4.Modeling additivity. Assume that the covariance function is additive, i.e.,with \(\boldsymbol{\psi }_{k} =\{\boldsymbol{\psi } _{k_{i}}\} \cup \{\boldsymbol{\psi }_{k_{ij}}\}\). If \(f_{i}(\mathbf{x}),f_{ij}(\mathbf{x}),\ldots\) are samples from Equation (23) with covariances \(k_{i}(\cdot ,\cdot ;\boldsymbol{\psi }_{k_{i}})\), \(k_{ij}(\cdot ,\cdot ;\boldsymbol{\psi }_{k_{ij}})\), \(\ldots\), respectively, then$$\displaystyle{ k(\mathbf{x},\mathbf{x}';\boldsymbol{\psi }_{k}) =\sum _{ i=1}^{d}k_{ i}(x_{i},x_{i}';\boldsymbol{\psi }_{k_{i}}) +\sum _{1\leq i<j\leq d}k_{ij}\left ((x_{i},x_{j}),(x_{i}',x_{j}');\boldsymbol{\psi }_{k_{ij}}\right )+\ldots , }$$(36)is a sample from Equation (23) with the additive covariance defined in Equation (36). These ideas can be used to deal effectively with high-dimensional inputs [22, 23].$$\displaystyle{f(\mathbf{x}) =\sum _{ i=1}^{d}f_{ i}(x_{i}) +\sum _{1\leq i<j\leq d}f_{ij}(x_{i},x_{j})+\ldots ,}$$

*s*> 0 may be interpreted as the

*signal strength*and \(\ell_{i} > 0\) as the

*length scale*of the \(i = 1,\ldots ,d\) input dimension. Prior beliefs about \(\boldsymbol{\psi }_{k} =\{\sigma ^{2}\} \cup \{\boldsymbol{\psi }_{k_{0}}\}\) are modeled by:

#### 2.5.2 Conditioning on Observations of the Response

As seen in the previous section, one’s prior knowledge about the response can be modeled in terms of a generic GP defined by Equations (23), (24), and (25). Now, assume that one makes *n* simulations at inputs \(\mathbf{x}_{1},\ldots ,\mathbf{x}_{n}\) and that he observes the responses \(y_{1} = f(\mathbf{x}_{1}),\ldots ,y_{n} = f(\mathbf{x}_{n})\). Write \(\mathbf{X} =\{ \mathbf{x}_{1},\ldots ,\mathbf{x}_{n}\}\) and \(\mathbf{Y} =\{ y_{1},\ldots ,y_{n}\}\) for the observed inputs and outputs, respectively. Abusing the mathematical notation slightly, the symbol \(\mathcal{D}\) is used to denote **X** and **Y** collectively (see Equation (14)). We refer to \(\mathcal{D}\) as the *(observed) data*. How does the observation of \(\mathcal{D}\) alter one’s state of knowledge about the response surface?

*posterior*of the hyper-parameters:

*likelihood*of \(\mathcal{D}\) induced by the defining property of the GP (Equation (19)) and \(p(\mathcal{D}) =\int p(\mathcal{D}\vert \boldsymbol{\psi })p(\boldsymbol{\psi })d\boldsymbol{\psi }\) the evidence.

#### 2.5.3 Treating Space and Time by Using Separable Mean and Covariance Functions

The input **x** may include spatial **x**_{ s }; time, *t*; and stochastic, \(\boldsymbol{\xi }\), components (see Equation (1)). A computer code reports the response for a given \(\boldsymbol{\xi }\) (see Equation (7)) at a fixed set of *n*_{ s } spatial locations, **X**_{ s }, and *n*_{ t } time instants **X**_{ t } (see Equations (4) and (5), respectively). Now suppose that \(n_{\xi }\) observations of \(\boldsymbol{\xi }\) are to be made. Then, the size of the covariance matrix \(\mathbf{k}(\mathbf{X},\mathbf{X};\boldsymbol{\psi }_{k})\) used in Equation (19) becomes \((n_{\xi }n_{s}n_{t}) \times (n_{\xi }n_{s}n_{t})\). Since the cost of inference and prediction is \(O\left ((n_{s}n_{t}n_{\xi })^{3}\right )\), one encounters insurmountable computational issues even for moderate values of \(n_{\xi },n_{s}\), and *n*_{ t }. In an attempt to remedy this problem, simplifying assumptions must be made. Namely, one has to assume that the mean is a generalized linear model (see Equation (28)) with a *separable* set of basis functions, \(\mathbf{h}(\cdot )\), and that the covariance function is also separable.

Exploiting the fact that factorizations (e.g., Cholesky or QR) of a matrix formed by Kronecker products are given by the Kronecker products of the factorizations of the individual matrices [47], inference and predictions can be made in \(O(n_{s}^{3}) + O(n_{t}^{3}) + O(n_{\xi }^{3})\) time. For the complete details of this approach, the reader is directed to the appendix of Bilionis et al. [10]. It is worth mentioning that exactly the same idea was used by Stegle et al. [46].

#### 2.5.4 Treating Space and Time by Using Output Dimensionality Reduction

*n*

_{ y }=

*n*

_{ s }

*n*

_{ t }denote the number of outputs of the code, \(\mathbf{y} \in \mathbb{R}^{n_{y}}\) the full output, and \(\mathbf{z} \in \mathbb{R}^{n_{z}}\) the reduced output. The choice of the right dimensionality reduction map is an open research problem. Principal component analysis (PCA) [12, Chapter 12] is going to be used for the identification of the dimensionality reduction map. Consider the empirical covariance matrix \(\mathbf{C} \in \mathbb{R}^{n_{y}\times n_{y}}\):

**Y**

_{ i }is the

*i*-th row of the output matrix

**Y**and \(\mathbf{m} \in \mathbb{R}^{n_{y}}\) is the empirical mean of the observed outputs:

**C**:

**C**as columns, and \(\mathbf{D} \in \mathbb{R}^{n_{y}\times n_{y}}\) is a diagonal matrix with the eigenvalues of

**C**on its diagonal. The PCA connection between

**y**and

**z**(reconstruction map) is given by:

*n*

_{ z }eigenvalues of

**C**so that 95

*%*, or more, of the observed variance of

**y**is explained. This is achieved by removing columns from

**V**and columns and rows from

**D**. The inverse of Equation (52) (reduction map) is given by:

**z**can be learned by any of the multi-output GP regression techniques to be introduced in subsequent sections .

### 2.6 Training the Parameters of the Gaussian Process

*particle approximation*. A particle approximation is a collection of weights,

*w*

^{(s)}, and samples, \(\boldsymbol{\psi }^{(s)}\), with which one may represent the posterior as:

*w*

^{(s)}≥ 0 and \(\sum _{s=1}^{S}w^{(s)} = 1\). The usefulness of Equation (54) relies on the fact that it allows one to approximate expectations with respect to \(p(\boldsymbol{\psi }\vert \mathcal{D})\) in a straightforward manner.

#### 2.6.1 Maximization of the Likelihood

#### 2.6.2 Maximization of the Posterior

#### 2.6.3 Markov Chain Monte Carlo

Markov chain Monte Carlo (MCMC) techniques [29, 31, 33] can be employed to sample from the posterior of Equation (43). These techniques result in a series of uncorrelated samples \(\boldsymbol{\psi }^{(s)},s = 1,\ldots ,S\) from Equation (43). The particle approximation is built by picking \(w^{(s)} = 1/S,s = 1,\ldots ,S\). MCMC is useful when the posterior is unimodal.

#### 2.6.4 Sequential Monte Carlo

*γ*:

*γ*= 0 one obtains the prior and for

*γ*= 1 one obtains the posterior. The idea is to start a particle approximation \(\left \{w_{0}^{(s)},\boldsymbol{\psi }_{0}^{(s)}\right \}_{s=1}^{S}\) from the prior (

*γ*= 0) which is very easy to sample from and gradually move it to the posterior (

*γ*= 1). This can be easily achieved if there is an underlying MCMC routine for sampling from Equation (60). In addition, the schedule of

*γ*’s can be picked adaptively. The reader is directed to the appendix of Bilionis and Zabaras [9] for the complete details. SMC-based methodologies are suitable for multimodal posteriors. Another very attractive attribute is that they are embarrassingly parallelizable.

### 2.7 Multi-output Gaussian Processes

In what follows , the treatment of generic *d*_{ y }-dimensional response functions \(\mathbf{f}(\cdot )\) is considered. The collection of all observed outputs is denoted by \(\mathbf{Y}_{d} = (\mathbf{y}_{1}^{T},\ldots ,\mathbf{y}_{n}^{T}) \in \mathbb{R}^{n\times d_{y}}\). The proper treatment of multiple-outputs in a computationally efficient way is an open research problem. Even though there exist techniques that attempt to capture nontrivial dependencies between the various outputs, e.g., [2, 3, 13, 46], here the focus is on computationally simple techniques that treat outputs as either independent or linearly dependent. These techniques tend to be the easiest to use in practice .

#### 2.7.1 Completely Independent Outputs

*i*-th output of \(\mathbf{f}(\cdot )\). The likelihood of the model is:

#### 2.7.2 Independent, but Similar, Outputs

*i*-th outputs, and \(\boldsymbol{\psi }_{k}\) are the parameters of the covariance function shared by all outputs. The likelihood of this model is given by an equation similar to Equation (63) with an appropriate mean and covariance function. The prior of \(\boldsymbol{\psi }\) is assumed to have an a priori independence structure similar to Equation (64). The advantage of this approach compared with the fully independent approach is that all outputs share the same covariance function, and, hence, its computational complexity is the same as that of a single-output Gaussian process regression.

#### 2.7.3 Linearly Correlated Outputs

*d*

_{ y }-dimensional Gaussian random field, i.e.,

*d*

_{ y }-outputs, and \(k(\cdot ,\cdot ;\boldsymbol{\psi }_{k})\) is a common covariance function. Equation (67) essentially means that, a priori,

### 2.8 Sampling Possible Surrogates

*analytically*. In particular, it can be expressed as

*n*

_{ d }

*design*points in \(\mathcal{X}\), and \(\mathbf{C}(\boldsymbol{\psi }) \in \mathbb{R}^{n_{d}\times d_{\omega }}\) is a matrix that corresponds to a factorization of the posterior covariance function of Equation (43) over the design points

**X**

_{ d }. The optimal choice of the design points,

**X**

_{ d }, is an open research problem. Below, some heuristics are provided. These depend on how one actually constructs the various quantities of Equation (74).

#### 2.8.1 The Karhunen-Loève Approach for Constructing \(\hat{f}(\cdot ;\boldsymbol{\theta })\)

*d*

_{ ω }and writes

Equation (81) is a Fredholm integral eigenvalue problem. In general, this equation cannot be solved analytically. A very recent study of the numerical techniques that can be used for the solution of this problem can be found in Betz et al. [5]. This work relies on the Nyström approximation [20, 40] and follows Betz et al. [5] closely in its development.

**x**

_{d, j}are randomly picked in \(\mathcal{X}\) and \(w_{j} = \frac{1} {n_{d}}\). Other choices could be based on tensor products of one-dimensional rules or a sparse grid quadrature rule [44]. As shown later on, for the special – but very common – case of a separable covariance function, the difficulty of the problem can be reduced dramatically.

**X**

_{ d }(see Equation (75)), and \(\mathbf{W} =\mathop{ \mathrm{diag}}\nolimits \left (w_{1},\ldots ,w_{n_{d}}\right )\). It is easy to see that the solution of Equation (86) can be obtained by first solving the regular eigenvalue problem

*d*

_{ ω }≤

*n*

_{ d }such that the

*α*% of the energy of the field is captured, e.g., \(\alpha = 90\) %. That is, one may pick

*d*

_{ ω }so that:

*d*

_{ ω }eigenvectors of

**B**of Equation (88), i.e.,

*d*

_{ ω }eigenvectors as columns .

#### 2.8.2 The O’Hagan Approach for Constructing \(\hat{f}(\cdot ;\boldsymbol{\theta })\)

**X**

_{ d }design points defined in Equation (80). Let \(\mathbf{Y}_{d} \in \mathbb{R}^{n_{d}\times 1}\) be the random variable corresponding to the unobserved output of the simulation on the design points

**X**

_{ d }. That is,

*n*

_{ d }is sufficiently large, then

*k*

^{∗∗}(

**x**,

**x**′) is very small and, thus, negligible. In other words, if

**X**

_{ d }is dense enough, then all the probability mass of Equation (98) is accumulated around the mean \(m^{{\ast}{\ast}}(\cdot ;\boldsymbol{\psi },\mathbf{Y}_{d})\). Therefore, one may think of the mean \(m^{{\ast}{\ast}}(\cdot ;\boldsymbol{\psi },\mathbf{Y}_{d})\) as a sample surface from Equation (40).

**Y**

_{ d }as

*n*

_{ d }is expected to be quite large, it is not a good idea to use all

*n*

_{ d }design points in

**X**

_{ d }to build a functional sample. Apart from the increased computational complexity, the Cholesky of the large covariance matrix of Equation (100) introduces numerical instabilities. A heuristic that can be used to construct a subset of design points,

**X**

_{d, s}, from the original, dense, set of design points,

**X**

_{ d }, without sacrificing accuracy, is discussed. The idea is to iteratively select the points of

**X**

_{ d }with maximum variance, until the maximum variance falls below a threshold

*ε*> 0. The algorithm is as follows:

- 1.Start with$$\displaystyle{\mathbf{X}_{d,s} =\{\} .}$$
- 2.Ifthen STOP. Otherwise, CONTINUE.$$\displaystyle{\vert \mathbf{X}_{d,s}\vert = n_{d},}$$
- 3.Findwhere \(k^{{\ast}{\ast}}(\mathbf{x},\mathbf{x}';\boldsymbol{\psi },\mathbf{X}_{d,s})\) is the covariance function defined in Equation (100) if$$\displaystyle{i^{{\ast}} =\arg \max _{ 1\leq j\leq n_{d}}k^{{\ast}{\ast}}(\mathbf{x}_{ d,i},\mathbf{x}_{d,i};\boldsymbol{\psi },\mathbf{X}_{d,s}),}$$
**X**_{d, s}is used instead of**X**_{ d }. - 4.Ifthen$$\displaystyle{k^{{\ast}{\ast}}(\mathbf{x}_{ d,i^{{\ast}}},\mathbf{x}_{d,i^{{\ast}}};\boldsymbol{\psi },\mathbf{X}_{d,s}) >\epsilon ,}$$and GO TO 2. Otherwise, STOP.$$\displaystyle{\mathbf{X}_{d,s} \leftarrow \mathbf{X}_{d,s} \cup \{\mathbf{x}_{d,i^{{\ast}}}\},}$$

Notice that when one includes a new point \(\mathbf{x}_{d,i^{{\ast}}}\), he has to compute the Choleksy decomposition of the covariance matrix \(\mathbf{k}^{{\ast}}(\mathbf{X}_{d,s} \cup \{\mathbf{x}_{d,i^{{\ast}}}\},\mathbf{X}_{d,s} \cup \{\mathbf{x}_{d,i^{{\ast}}}\};\boldsymbol{\psi })\). This can be done efficiently using rank-one updates of the covariance matrix (see Seeger [43]).

### 2.9 Semi-analytic Formulas for the Mean and the Variance

It is quite obvious how Equation (74) can be used to obtain samples from the predictive distribution, \(p(\mathcal{Q}\vert \mathcal{D})\), of a statistic of interest \(\mathcal{Q}[\cdot ]\). Thus, using a Monte Carlo procedure, one can characterize one’s uncertainty about any statistic of the response surface. This could become computationally expensive in the case of high dimensions and many observations, albeit less expensive than evaluating these statistics using the simulator itself. Fortunately, as shown in this section, it is actually possible to evaluate exactly the predictive distribution for the mean statistic, Equation (11), since it turns out to be Gaussian. Furthermore, it is possible to derive the predictive mean and variance of the covariance statistic Equation (12). In this subsection, it is shown that the predictive distribution for the mean statistic Equation (11) is actually Gaussian.

#### 2.9.1 One-Dimensional Output with No Spatial or Time Inputs

*d*

_{ y }= 1 and that there are no spatial or time inputs, i.e., \(d_{s} = d_{t} = 0\). In this case, one has \(\mathbf{x} =\boldsymbol{\xi },\mathbf{X}_{d} =\boldsymbol{\varXi } _{d}\), and he may simply rewrite Equation (74) as:

*d*

_{ y }one-dimensional integrals.

#### 2.9.2 One-Dimensional Output with Spatial and/or Time Inputs

Consider the case of one-dimensional output, i.e., *d*_{ y } = 1, with possible spatial and/or time inputs, i.e., *d*_{ s }, *d*_{ t } ≥ 0. In this generic case, \(\mathbf{x} = (\mathbf{x}_{s},t,\boldsymbol{\xi })\).

Equations similar to Equation (111) can be derived without difficulty for the covariance statistic of Equation (12) as well as for the variance statistic of Equation (13) [10, 14]. In contrast to the mean statistic, however, the resulting random field is not Gaussian. That is, an equation similar to Equation (112) does not hold.

## 3 Numerical Examples

### 3.1 Synthetic One-Dimensional Example

*d*

_{ y }= 0. That is, the input is just \(x =\xi\). Consider

*n*= 7 arbitrary observations, \(\mathcal{D} = \left \{\left (x^{(i)},y^{(i)}\right )\right \}_{i=1}^{n}\), which are shown as crosses in Fig. 1a. The goal is to use these seven observations to learn the underlying response function

*y*=

*f*(

*x*) and characterize one’s state of knowledge about the mean \(\mathbb{E}[f(\cdot )]\) (Equation (11)), the variance \(\mathbb{V}[f(\cdot )]\) (Equation (13)), and the induced probability density function in the response

*y*:

*p*(

*x*) is the input probability density, taken to be a Beta(10, 2) and shown in Fig. 1b.

The first step is to assign a prior Gaussian process to the response (Equation (18)). This is done by picking a zero mean and an SE covariance function (Equation (37)) with no nugget, \(\sigma ^{2} = 0\) in Equation (35), and fixed signal and length-scale parameters to *s* = 1 and \(\ell= 0.1\), respectively. These choices represent one’s prior beliefs about the underlying response function *y* = *f*(*x*).

*x*is

*x*is given, approximately, by \(\left (m^{{\ast}}(x) - 1.96\sigma ^{{\ast}}(x),m^{{\ast}}(x) + 1.96\sigma ^{{\ast}}(x)\right )\). The posterior mean can be thought of as a point estimation of the underlying response surface.

In order to sample possible surrogates from Equation (40), the Karhunen-Loève approach for constructing \(\hat{f}(\cdot ;\boldsymbol{\theta })\) is followed (see Equations (74) and (83)), retaining *d*_{ ω } = 3 eigenfunctions (see Equations (81) and (91)) of the posterior covariance which account for more than *α* = 90 % of the energy of the posterior GP (see Equation (92)). These eigenfunctions are shown in Fig. 1c. Using the constructed \(\hat{f}(\cdot ;\boldsymbol{\theta })\), one can sample candidate surrogates. Three such samples are shown as solid black lines in Fig. 1a.

Having constructed a finite dimensional representation of the posterior GP, one is in a position to characterize one’s state of knowledge about arbitrary statistics of the response, which is captured by Equation (17). Here the suggested two-step procedure is followed. That is, candidate surrogates are repeatedly sampled and then the statistic of interest are computed for each sample. In the results presented, 1,000 sampled candidate surrogates are used. Figure 1d shows the predictive probability density for the mean of the response \(p\left (\mathbb{E}[f(\cdot )]\vert \mathcal{D}\right )\). Note that this result can also be obtained semi-analytically using Equation (104). Figure 1e shows the predictive probability density for the variance of the response \(p\left (\mathbb{V}[f(\cdot )]\vert \mathcal{D}\right )\), which cannot be approximated analytically. Finally, subfigure (f) of the same figure characterizes the predictive distribution of the PDF of the response *p*(*y*). Specifically, the blue dashed line corresponds to the median of the PDFs of each one of the 1,000 sampled candidate surrogates, while the gray shaded area corresponds to a 95 % predictive interval around the median. The solid black lines of the same figure are the PDFs of three arbitrary sampled candidate surrogates .

### 3.2 Dynamical System Example

*d*

_{ y }= 3. For each choice of \(\boldsymbol{\xi }\), the computer emulator, \(\mathbf{f}_{c}(\boldsymbol{\xi })\) of Equation (7), reports the response at

*n*

_{ t }= 20 equidistant time steps in [0, 10],

**X**

_{ t }of Equation (5). The result of \(n_{\xi }\) randomly picked simulations is observed, and one wants to characterize his state of knowledge about the statistics of the response. Consider the case of \(n_{\xi } = 70,100\), and 150. Note that propagating uncertainty through this dynamical system is not trivial since there exists a discontinuity in the response surface as \(\xi _{1}\) crosses zero.

The prior GP is picked to be a multi-output GP with linearly correlated outputs, Equation (67), with a constant mean function, \(h(t,\boldsymbol{\xi }) = 1\), and a separable covariance function, Equation (46), with both the time and stochastic covariance functions being SE, Equation (37), with nuggets, Equation (35). Denote the hyper-parameters of the time and stochastic part of the covariance by \(\boldsymbol{\psi }_{t} =\{\ell _{t},\sigma _{t}\}\) and \(\boldsymbol{\psi }_{\xi } =\{\ell _{\xi ,1},\ell_{\xi ,2},\sigma _{\xi }\}\), respectively. An exponential prior is assigned to all of them, albeit with different rate parameters. Specifically, the rate of \(\ell_{t}\) is 2, the rate of \(\ell_{\xi ,i},i = 1,2\) is 20, and the rate of the nuggets \(\sigma _{t}\) and \(\sigma _{\xi }\) is 10^{6}. This assignment corresponds to the vague prior knowledge that the a priori mean of the time scale is about 0. 5 of the time unit, the scale of \(\boldsymbol{\xi }\) is about 0. 05 of its unit, and the nuggets expected to be around 10^{−6}. According to the comment below Equation (70), the signal strength can be picked to be identically equal to one since it is absorbed by the covariance matrix \(\boldsymbol{\varSigma }\). For the hyper-parameters of the mean function, i.e., the constant number, a flat uninformative prior is assigned. As already discussed, with this choice it is possible to integrate it out of the model analytically.

The model is trained by sampling the posterior of \(\boldsymbol{\psi }=\{\boldsymbol{\psi } _{t},\boldsymbol{\psi }_{\xi }\}\) (see Equation (43)) using a mixed MCMC-Gibbs scheme (see [10] for a discussion on the scheme and evidence of convergence). After the MCMC chain sufficiently mixed (it takes about 500 iterations), a particle approximation of the posterior state of knowledge about the response surface is constructed. This is done this as follows. For every 100-th step of the MCMC chain (the intermediate 99 steps are dropped to reduce the correlations), 100 candidate surrogates are drawn using the O’Hagan procedure with a tolerance of \(\epsilon = 10^{-2}\).

*y*

_{2}(

*t*) is summarized. Specifically, the four rows correspond to four different time instants,

*t*= 4, 6, 8, and 10, and the columns refer to different sample sizes of \(n_{\xi } = 70,100\), and 150, counting from the left .

### 3.3 Partial Differential Equation Example

In this example, it is shown how the Bayesian approach to uncertainty propagation can be applied to a partial differential equation. In particular, a two-dimensional (\(\mathcal{X}_{s} = [0,1]^{2}\) and *d*_{ s } = 2), single-phase, steady-state (*d*_{ t } = 0) flow through an uncertain permeability field is studied; see Aarnes et al. [1] for a review of the underlying physics and solution methodologies. The uncertainty in the permeability is represented by a truncated KLE of an exponentiated Gaussian random field with exponential covariance function of signal strength equal to one and correlation length equal to 0. 1 and a zero mean. The total number of stochastic variables corresponds to the truncation order of the KLE, and it is chosen to be \(n_{\xi } = 50\). Three outputs, *d*_{ y } = 3, are considered: the pressure, \(p(\mathbf{x}_{s};\boldsymbol{\xi })\), and the horizontal and vertical components of the velocity field, \(u(\mathbf{x}_{s};\boldsymbol{\xi })\) and \(v(\mathbf{x}_{s};\boldsymbol{\xi })\), respectively. The emulator, \(\mathbf{f}_{c}(\boldsymbol{\xi })\) of Equation (7), is based on the finite element method and is described in detail in [10], and it reports the response on a regular 32 × 32 spatial grid, i.e., \(n_{s} = 32^{2} = 1,024\). The objective is to quantify the statistics of the response using a limited number of \(n_{\xi } = 24,64\), and 120 simulations. The results are validated by comparing against a plain vanilla MC estimate of the statistics using 108, 000 samples.

As in the previous example, the prior state of knowledge is represented using a multi-output GP with linearly correlated outputs, Equation (67); a constant mean function, \(h(t,\boldsymbol{\xi }) = 1\); and a separable covariance function, Equation (46), with both the space and stochastic covariance functions being SE, Equation (37), with nuggets, Equation (35). Denote the hyper-parameters of the spatial and stochastic part of the covariance by \(\boldsymbol{\psi }_{s} =\{\ell _{s,1},\ell_{s,2},\sigma _{s}\}\) and \(\boldsymbol{\psi }_{\xi } =\{\ell _{\xi ,1},\ldots ,\ell_{\xi ,50},\sigma _{\xi }\}\), respectively. Note that the fact that the spatial component is also separable is exploited to significantly reduce the computational cost of the calculations. Again, exponential priors are assigned. The rate parameters of the spatial length scales are 100 corresponding to an a priori expectation of 0.01 spatial units. The rates of \(\ell_{xi,i},\sigma _{s}\), and \(\sigma _{\xi }\) are 3, 100, and 100, respectively.

The posterior of the hyper-parameters, Equation (43), is sampled using 100,000 iterations of the same MCMC-Gibbs procedure as in the previous example. However, in order to reduce the computational burden, a single-particle MAP approximation to the posterior, Equation (59), is constructed, by searching for the MAP over the 100,000 MCMC-Gibbs samples collected. Then, 100 candidate surrogate surfaces are sampled following the O’Hagan procedure with a tolerance of \(\epsilon = 10^{-2}\). For each sampled surrogate, the statistics of interest are calculated and compared to MC estimates.

## 4 Conclusions

In this chapter we presented a comprehensive review of the Bayesian approach to the UP problem that is able to quantify the epistemic uncertainty induced by limited number of simulations. The core idea was to interpret a GP as a probability measure on the space of surrogates which characterizes our prior state of knowledge about the response surface. We focused on practical aspects of GPs such as the treatment of spatiotemporal variation and multi-output responses. We showed how the prior GP can be conditioned on the observed simulations to obtain a posterior GP, whose probability mass corresponds to the epistemic uncertainty introduced by the limited number of simulations, and we introduced sampling-based techniques that allow for its quantification.

Despite the successes of the current state of the Bayesian approach to the UP problem, there is still a wealth of open research questions. First, carrying out GP regression in high dimensions is not a trivial problem since it requires the development of application-specific covariance functions. The study of covariance functions that automatically perform some kind of internal dimensionality reduction seems to be a promising step forward. Second, in order to capture sharp variations in the response surface, such as localized bumps or even discontinuities, there is a need for flexible nonstationary covariance functions or alternative approaches based on mixtures of GPs, e.g., see [14]. Third, there is a need for computationally efficient ways of treating nonlinear correlations between distinct model outputs, since this is expected to squeeze more information out of the simulations. Fourth, as a semi-intrusive approach, the mathematical models describing the physics of the problem could be used to derive physics-constrained covariance functions that would, presumably, force the prior GP probability measure to be compatible with known response properties, such as mass conservation. That is, such an approach would put more effort on better representing our prior state of knowledge about the response. Fifth, there is an evident need for developing simulation selection policies which are specifically designed to gather information about the uncertainty propagation task. Finally, note that the Bayesian approach can also be applied to other important contexts such as model calibration and design optimization under uncertainty. As a result, all the open research questions have the potential to also revolutionize these fields.

## References

- 1.Aarnes, J.E., Kippe, V., Lie, K.A., Rustad, A.B.: Modelling of multiscale structures in flow simulations for petroleum reservoirs. In: Hasle, G., Lie, K.A., Quak, E. (eds.): Geometric Modelling, Numerical Simulation, and Optimization, chap. 10, pp. 307–360. Springer, Berlin/Heidelberg (2007). doi:10.1007/978-3-540-68783-2_10Google Scholar
- 2.Alvarez, M., Lawrence, N.D.: Sparse convolved Gaussian processes for multi-output regression. In: Koller, D., Schuurmans, D., Bengio, Y., and Bottou. L. (eds.): Advances in Neural Information Processing Systems 21 (NIPS 2008), Vancouver, B.C., Canada (2008)Google Scholar
- 3.Alvarez, M., Luengo-Garcia, D., Titsias, M., Lawrence, N.: Efficient multioutput Gaussian processes through variational inducing kernels. In: Ft. Lauderdale, FL, USA (2011)Google Scholar
- 4.Babuska, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial differential equations with random input data. SIAM J. Numer. Anal.
**45**(3), 1005–1034 (2007)MathSciNetCrossRefzbMATHGoogle Scholar - 5.Betz, W., Papaioannou, I., Straub, D.: Numerical methods for the discretization of random fields by means of the Karhunen-Loeve expansion. Comput. Methods Appl. Mech. Eng.
**271**, 109–129 (2014). doi:10.1016/j.cma.2013.12.010MathSciNetCrossRefzbMATHGoogle Scholar - 6.Bilionis, I.: py-orthpol: Construct orthogonal polynomials in python. https://github.com/PredictiveScienceLab/py-orthpol (2013)
- 7.Bilionis, I., Zabaras, N.: Multi-output local Gaussian process regression: applications to uncertainty quantification. J. Comput. Phys.
**231**(17), 5718–5746 (2012) doi:10.1016/J.Jcp.2012.04.047MathSciNetCrossRefzbMATHGoogle Scholar - 8.Bilionis, I., Zabaras, N.: Multidimensional adaptive relevance vector machines for uncertainty quantification. SIAM J. Sci. Comput.
**34**(6), B881–B908 (2012). doi:10.1137/120861345MathSciNetCrossRefzbMATHGoogle Scholar - 9.Bilionis, I., Zabaras, N.: Solution of inverse problems with limited forward solver evaluations: a Bayssian perspective. Inverse Probl.
**30**(1), Artn 015004 (2014). doi:10.1088/0266-5611/30/1/015004Google Scholar - 10.Bilionis, I., Zabaras, N., Konomi, B.A., Lin, G.: Multi-output separable Gaussian process: towards an efficient, fully Bayesian paradigm for uncertainty quantification. J. Comput. Phys.
**241**, 212–239 (2013). doi:10.1016/J.Jcp.2013.01.011CrossRefGoogle Scholar - 11.Bilionis, I., Drewniak, B.A., Constantinescu, E.M.: Crop physiology calibration in the CLM. Geoscientific Model Dev.
**8**(4), 1071–1083 (2015). doi:10.5194/gmd-8-1071-2015, http://www.geosci-model-dev.net/8/1071/2015 http://www.geosci-model-dev.net/8/1071/2015/gmd-8-1071-2015.pdf, gMD http://www.geosci-model-dev.net/8/1071/2015/gmd-8-1071-2015.pdf - 12.Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)zbMATHGoogle Scholar
- 13.Boyle, P., Frean, M.: Dependent Gaussian processes. In: Saul, L.K., Weiss, Y., and Bottou L. (eds.): Advances in Neural Information Processing Systems 17 (NIPS 2004), Whistler, B.C., Canada (2004)Google Scholar
- 14.Chen, P., Zabaras, N., Bilionis, I.: Uncertainty propagation using infinite mixture of Gaussian processes and variational Bayssian inference. J. Comput. Phys.
**284**, 291–333 (2015)MathSciNetCrossRefGoogle Scholar - 15.Conti, S., O’Hagan, A.: Bayesian emulation of complex multi-output and dynamic computer models. J. Stat. Plan. Inference
**140**(3), 640–651 (2010). doi:10.1016/J.Jspi.2009.08.006MathSciNetCrossRefzbMATHGoogle Scholar - 16.Currin, C., Mitchell, T., Morris, M., Ylvisaker, D.: A Bayesian approach to the design and analysis of computer experiments. Report, Oak Ridge Laboratory (1988)CrossRefGoogle Scholar
- 17.Currin, C., Mitchell, T., Morris, M., Ylvisaker, D.: Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. J. Am. Stat. Assoc.
**86**(416), 953–963 (1991). doi:10.2307/2290511MathSciNetCrossRefGoogle Scholar - 18.Dawid, A.P.: Some matrix-variate distribution theory – notational considerations and a Bayesian application. Biometrika
**68**(1), 265–274 (1981)MathSciNetCrossRefzbMATHGoogle Scholar - 19.Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B. (Stat. Methodol.)
**68**(3), 411–436 (2006)Google Scholar - 20.Delves, L.M., Walsh, J.E., of Manchester Department of Mathematics, U., of Computational LUD, Science, S.: Numerical Solution of Integral Equations. Clarendon Press, Oxford (1974)Google Scholar
- 21.Doucet, A., De Freitas, N., Gordon, N. (eds.): Sequential Monte Carlo Methods in Practice (Statistics for Engineering and Information Science). Springer, New York (2001)Google Scholar
- 22.Durrande, N., Ginsbourger, D., Roustant, O.: Additive covariance kernels for high-dimensional Gaussian process modeling. arXiv:11116233 (2011)Google Scholar
- 23.Duvenaud, D., Nickisch, H., Rasmussen, C.E.: Additive Gaussian processes. In: Advances in Neural Information Processing Systems, vol. 24, pp. 226–234 (2011)Google Scholar
- 24.Gautschi, W.: On generating orthogonal polynomials. SIAM J. Sci. Stat. Comput.
**3**(3), 289–317 (1982). doi:10.1137/0903018MathSciNetCrossRefzbMATHGoogle Scholar - 25.Gautschi, W.: Algorithm-726 – ORTHPOL – a package of routines for generating orthogonal polynomials and Gauss-type quadrature rules. ACM Trans. Math. Softw.
**20**(1), 21–62 (1994) doi:10.1145/174603.174605CrossRefzbMATHGoogle Scholar - 26.Ghanem, R., Spanos, P.D.: Stochastic Finite Elements: A Spectral Approach, rev. edn. Dover Publications, Minneola (2003)Google Scholar
- 27.Gramacy, R.B., Lee, H.K.H.: Cases for the nugget in modeling computer experiments. Stat. Comput.
**22**(3), 713–722 (2012) doi:10.1007/s11222-010-9224-xMathSciNetCrossRefzbMATHGoogle Scholar - 28.Haff, L.: An identity for the Wishart distribution with applications. J. Multivar. Anal.
**9**(4), 531–544 (1979). doi:http://dx.doi.org/10.1016/0047-259X(79)90056-3Google Scholar - 29.Hastings, W.K.: Monte-Carlo sampling methods using Markov chains and their applications. Biometrika
**57**(1), 97–109 (1970). doi:10.2307/2334940MathSciNetCrossRefzbMATHGoogle Scholar - 30.Higdon, D., Gattiker, J., Williams, B., Rightley, M.: Computer model calibration using high-dimensional output. J. Am. Stat. Assoc.
**103**(482), 570–583 (2008)MathSciNetCrossRefzbMATHGoogle Scholar - 31.Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer Series in Statistics. Springer, New York (2001)zbMATHGoogle Scholar
- 32.Loève, M.: Probability Theory, 4th edn. Graduate Texts in Mathematics. Springer, New York (1977)zbMATHGoogle Scholar
- 33.Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys.
**21**(6), 1087–1092 (1953). doi:10.1063/1.1699114CrossRefGoogle Scholar - 34.Oakley, J., O’Hagan, A.: Bayesian inference for the uncertainty distribution of computer model outputs. Biometrika
**89**(4), 769–784 (2002)CrossRefGoogle Scholar - 35.Oakley, J.E., O’Hagan, A.: Probabilistic sensitivity analysis of complex models: a Bayesian approach. J. R. Stat. Soc. Ser. B Stat. Methodol.
**66**, 751–769 (2004). doi:10.1111/j.1467-9868.2004.05304.xMathSciNetCrossRefzbMATHGoogle Scholar - 36.O’Hagan, A.: Bayes-Hermite quadrature. J. Stat. Plan. Inference
**29**(3), 245–260 (1991)MathSciNetCrossRefzbMATHGoogle Scholar - 37.O’Hagan, A., Kennedy, M.: Gaussian emulation machine for sensitivity analysis (GEM-SA) (2015). http://www.tonyohagan.co.uk/academic/GEM/ Google Scholar
- 38.O’Hagan, A., Kennedy, M.C., Oakley, J.E.: Uncertainty analysis and other inference tools for complex computer codes. Bayesian Stat.
**6**, 503–524 (1999)MathSciNetzbMATHGoogle Scholar - 39.Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)zbMATHGoogle Scholar
- 40.Reinhardt, H.J.: Analysis of Approximation Methods for Differential and Integral Equations. Applied Mathematical Sciences. Springer, New York (1985)CrossRefzbMATHGoogle Scholar
- 41.Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer Texts in Statistics. Springer, New York (2004)CrossRefzbMATHGoogle Scholar
- 42.Sacks, J., Welch, W.J., Mitchell, T., Wynn, H.P.: Design and analysis of computer experiments. Stat. Sci.
**4**(4), 409–423 (1989)MathSciNetCrossRefzbMATHGoogle Scholar - 43.Seeger, M.: Low rank updates for the Cholesky decomposition. Report, University of California at Berkeley (2007)Google Scholar
- 44.Smolyak, S.A.: Quadrature and interpolation formulas for tensor products of certain classes of functions. Sov. Math. Dokl.
**4**, 240–243 (1963)zbMATHGoogle Scholar - 45.Stark, H., Woods, J.W., Stark, H.: Probability and Random Processes with Applications to Signal Processing, 3rd edn. Prentice Hall, Upper Saddle River (2002)Google Scholar
- 46.Stegle, O., Lippert, C., Mooij, J.M., Lawrence, N.D., Borgwardt, K.M.: Efficient inference in matrix-variate Gaussian models with
*backslash*iid observation noise. In: Shawe-Taylor, J., Zemel, R.S., Barlett, P.L., Pereira, F., Weinberger K.Q. (eds.): Advances in Neural Information Processing Systems 24 (NIPS 2011), Granada, Spain (2011)Google Scholar - 47.Van Loan, C.F.: The ubiquitous Kronecker product. J. Comput. Appl. Math.
**123**(1–2), 85–100 (2000)Google Scholar - 48.Wan, J., Zabaras, N.: A Bayssian approach to multiscale inverse problems using the sequential Monte Carlo method. Inverse Probl.
**27**(10), 105004 (2011)MathSciNetCrossRefzbMATHGoogle Scholar - 49.Wan, X.L., Karniadakis, G.E.: An adaptive multi-element generalized polynomial chaos method for stochastic differential equations. J. Comput. Phys.
**209**(2), 617–642 (2005). doi:10.1016/j.jcp.2005.03.023, <GotoISI>://WOS:000230736700011 Google Scholar - 50.Welch, W.J., Buck, R.J., Sacks, J., Wynn, H.P., Mitchell, T.J., Morris, M.D.: Screening, predicting, and computer experiments. Technometrics
**34**(1), 15–25 (1992)CrossRefGoogle Scholar - 51.Xiu, D.B.: Efficient collocational approach for parametric uncertainty analysis. Commun. Comput. Phys.
**2**(2), 293–309 (2007)MathSciNetzbMATHGoogle Scholar - 52.Xiu, D.B., Hesthaven, J.S.: High-order collocation methods for differential equations with random inputs. SIAM J. Sci. Comput.
**27**(3), 1118–1139 (2005)MathSciNetCrossRefzbMATHGoogle Scholar - 53.Xiu, D.B., Karniadakis, G.E.: The wiener-askey polynomial chaos for stochastic differential equations. SIAM J. Sci. Comput.
**24**(2), 619–644 (2002)MathSciNetCrossRefzbMATHGoogle Scholar