Skip to main content
Log in

Boosting Kernel-Based Dimension Reduction for Jointly Propagating Spatial Variability and Parameter Uncertainty in Long-Running Flow Simulators

  • Published:
Mathematical Geosciences Aims and scope Submit manuscript

Abstract

Assessing the impact of multiple sources of uncertainty in flow models requires simulating the response associated with a large set of stochastic realizations (\(>\!\!1,\!000\)) of both the spatial variable (e.g. permeability field) and of the large number of model parameters (e.g., of the relative permeability curve). Yet, this Monte Carlo approach may be hindered by the high computation time cost of long-running flow simulators (CPU time \(> \)several hours). A possible strategy can then rely on the combination of meta-modelling techniques and of basis set expansion like kernel principal component analysis to reduce the high-dimensionality (\(>\!\!10,\!000\) grid cells) of the spatial variable. Using a synthetic heterogeneous channelized reservoir, it is shown that \(\sim \)50 principal components PCs are still necessary to retain \(\sim \)80 % of the total variability: this might pose difficulties for the construction of the meta-model by imposing a large set of training samples. This computational limitation can be overcome by improving the meta-modelling phase through boosting techniques, which allow selecting during the fitting process the input variables (PCs or model parameters), which are the most informative regarding the prediction accuracy, and allow in this manner reducing the number of necessary model runs. The procedure is applied to a synthetic flow model of a CO\(_{2}\) geological storage with low CPU time. By comparing the P10, P50 and P90 quantiles of the pressure build-up estimated using the true simulator (1,500 simulations) with the ones estimated using the meta-model. The joint impact of spatial variability and of 12 uncertain parameters is shown to be accurately captured using a number of training samples of the order of the number of PCs (\(\sim \)a few tens). The level of performance proved to be as high as the one of the distance-based alternative proposed in the literature to solve the problem of joint propagation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Arnold D, Demyanov V, Tatum D, Christie MA, Rojas TS, Geiger-Boschung S, Corbett PWM (2013) Hierarchical benchmark case study for history matching, uncertainty quantification and reservoir characterization. Comput Geosci 50:4–15

    Article  Google Scholar 

  • Bazargan H, Christie MA, Tchelepi H (2013) Efficient Markov chain Monte Carlo sampling using polynomial Chaos expansion. In: SPE reservoir simulation symposium. The Woodlands, Texas. SPE 163663

  • Birkholzer JT, Zhou Q, Tsang C-F (2009) Large-scale impact of CO\(_{2}\) storage in deep saline aquifers: a sensitivity study on pressure response in stratified systems. Int J Greenhouse Gas Control 3(2):181–194

    Article  Google Scholar 

  • Bouc O, Bellenfant G, Dubois D, Guyonnet D, Rohmer J, Wertz F, Gastine M, Jacquemet N, Vong CQ, Grataloup S, Picot-Colbeaux G, Fabriol H (2011) Safety criteria for CO\(_2\) geological storage: determination workflow and application in the context of the Paris Basin. Energy Procedia 4:4020–4027

    Article  Google Scholar 

  • Bouquet S, de Fouquet C, Bruel D (2013) Optimization of CO\(_2\) storage assessment using selection of stochastic realisations. In: 7th international conference on sensitivity analysis of model output, 1–4 July 2013, Nice, France

  • Bühlmann P (2006) Boosting for high-dimensional linear models. Ann Stat 34:559–583

    Article  Google Scholar 

  • Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat Sci 22:477–522

    Article  Google Scholar 

  • Busby D, Romary T, Touzani S, Feraille M, Noetinger B, Hu LY (2007) Reservoir forecasting under uncertainty: an integrated approach. In: International meeting on complexity in oil industry, 5–9 August 2007, Natal, Brazil

  • Caers J, Zhang T (2004) Multiple-point geostatistics: a quantitative vehicle for integrating geologic analogs into multiple reservoir models. AAPG Mem 80:383–394

    Google Scholar 

  • Caers J, Scheidt C (2011) Integration of engineering and geological uncertainty for reservoir performance prediction using a distance-based approach. AAPG Memoir 96:191–202

    Google Scholar 

  • Fetel E, Caumon G (2008) Reservoir flow uncertainty assessment using response surface constrained by secondary information. J Petrol Sci Eng 60:170–182

    Article  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

    Book  Google Scholar 

  • Hemez FM, Atamturktur S (2011) The dangers of sparse sampling for the quantification of margin and uncertainty. Reliab Eng Syst Saf 96(9):1220–1231. doi:10.1016/j.ress.2011.02.015

    Article  Google Scholar 

  • Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2010) Model-based boosting 2.0. J Mach Learn Res 11:2109–2113

    Google Scholar 

  • Hou Z, Engel DW, Lin G, Fang Y, Fang Z (2013) An uncertainty quantification framework for studying the effect of spatial heterogeneity in reservoir permeability on CO\(_{2 }\) sequestration. Math Geosci 45(7):799–817. doi: 10.1007/s11004-013-9459-0

  • Issautier B, Viseur S, Audigane P, le Nindre Y-M (2014) Impacts of fluvial reservoir heterogeneity on connectivity: implications in estimating geological storage capacity for CO2. Int J Greenhouse Gas Control 20:333–349

  • Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, Berlin

    Google Scholar 

  • Koehler JR, Owen AB (1996) Computer experiment. In: Ghosh S, Rao CR (eds) Handbook of statistics. Elsevier Science, New York, pp 261–308

    Google Scholar 

  • Ma X, Zabaras N (2011) Kernel principal component analysis for stochastic input model generation. J Comput Phys 230:7311–7331

  • McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21:239–245

    Google Scholar 

  • Mika S, Schölkopf B, Smola A, Muller KR, Scholz M, Ratsch G (1999) Kernel PCA and denoising in feature spaces. Adv Neural Inf Process Syst, vol 11. MIT Press, Massachusetts, pp 536–542

  • Norden B, Frykman P (2013) Geological modelling of the Triassic Stuttgart formation at the Ketzin CO\(_{2}\) storage site, Germany. Int J Greenhouse Gas Control 19:756–774. http://dx.doi.org/10.1016/j.ijggc.2013.04.019

  • Pappenberger F, Beven KJ (2006) Ignorance is bliss: or seven reasons not to use uncertainty analysis. Water Resour Res 42(W05302):2005W. doi:10.1029/R004820

    Google Scholar 

  • Pruess K (2005) ECO2N: a TOUGH2 fluid property module for mixtures of water, NaCl, and CO2. Report LBNL-57952. Lawrence Berkeley National Laboratory, Berkeley

  • R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0. http://www.R-project.org/

  • Rathi Y, Dambreville S, Tannenbaum A (2006) Statistical shape analysis using kernel PCA. In: Image processing: algorithms and systems, neural networks, and machine learning, SPIE, p 60641B

  • Rohmer J (2014) Dynamic sensitivity analysis of long-running landslide models through basis set expansion and meta-modelling. Nat Hazards 73(1):5–22. doi:10.1007/s11069-012-0536-3

  • Sarma P, Durlofsky LJ, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math Geosci 40:3–32. doi:10.1007/s11004-007-9131-7

    Article  Google Scholar 

  • Scheidt C, Caers J (2008) Representing spatial uncertainty using distances and kernels. Math Geosci 41(4):397–419. doi:10.1007/s11004-008-9186-0

    Article  Google Scholar 

  • Schölkopf B, Smola A, Muller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319

    Article  Google Scholar 

  • Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Math Geol 34:1–21

    Article  Google Scholar 

  • Sun AY, Zeidouni M, Nicot J-P, Lu Z, Zhang D (2013) Assessing leakage detectability at geologic CO2 sequestration sites using the probabilistic collocation method. Adv Water Res 56:49–60. doi:10.1016/j.advwatres.2012.11.017

    Article  Google Scholar 

  • Zhang D, Lu Z (2004) An efficient, high-order perturbation approach for flow in random porous media via Karhunen–Loève and polynomial expansions. J Comput Phys 194:773–794

    Article  Google Scholar 

Download references

Acknowledgments

The research leading to these results has been carried out in the framework of the ULTIMATE-CO2 Project, funded by the European Commission’s Seventh Framework Program [FP7/2007-2013] under grant agreement n\(^{\circ }\) 281196. We thank the anonymous reviewer for their comments, which led to significant improvements to this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Rohmer.

Appendices

Appendix A: Kernel Principal Component Analysis

Here the basic concepts of the Kernel Principal Component Analysis (KPCA) are introduced. A more detailed introduction can be found in (Schölkopf et al. 1998).

The study focuses on the most common case in flow modelling corresponding to spatial data discretized on a mesh grid (i.e. matrices of large but finite dimension). The preliminary step is to transform the three dimensional \(N_{x} \times N_{y}\times N_{z}\) matrices associated with the spatial inputs into 1D vectors \(\varvec{z}( r)\) where the index \(r\) takes finite values in the set (1, 2, ..., \(N_\mathrm{F})\) with \(N_\mathrm{F}=N_{x} \times N_{y} \times \) \(N_{z}\). Consider \(n_{0 }\) of such vectors \(\varvec{z}_{i} ( r)\) (with \(i = 1\), \({\ldots }\), \(n_{0})\) and discretized in \(N_\mathrm{F}\)-vectors and define \(\varvec{Z}\) the \(n_{0}\times N_\mathrm{F}\) matrix so that each row is composed of one vector. The objective of basis set expansion is to reduce the set of such vectors to projected vectors of finite number \(d<\!<N_{F}\) so that they describe the key features of the spatial evolution of the considered variables corresponding to the dominant modes of variations in space. This can be achieved by expanding the spatial input in an appropriate functional coordinate system in terms of some basis functions of the \(r\) index \(\varphi _j ( r)\) (with j \(=\) 1, ..., \(d)\). The basis set expansion of the set of centred vectors \({\varvec{z}}^C( r)\) reads as

$$\begin{aligned} \varvec{z}_i^C ( r)=\varvec{z}_i ( r)-\varvec{\bar{z}}( r)\approx \sum \limits _{j=1}^d {\lambda _{ij} \varphi _j ( r)} \end{aligned}$$
(3)

where the mean spatial function \(\bar{\varvec{z}}(r)\) is computed as the mean of the \(\varvec{z}_i ( r)\) for each \(r\) index. The scalar expansion coefficients \({\mathbf {\lambda }}_{ij}\) indicate the weight (contribution) of each of the \(d\) basis function in each of the \(n_{0}\) vectors. Usually, the dimension \(d\) is chosen so that most information is concentrated in the \(d\) first basis functions, and the variance in the set of spatial maps is explained at a minimum level of, say 80.0 %. The basis functions can be determined from the data. The classical data-driven method is the multivariate principal component analysis, denoted PCA (Jolliffe 2002), and more specifically its non-linear formulation, known as kernel-based KPCA.

Consider a non-linear mapping \(\varPhi \) that relates the input space \(R_{NF} \) to another space \(F\) (referred to as the feature space) which is assumed to have a better linear variation than \(R_{NF} \), which means that points are linearly separable in \(F\). The vectors \(\varvec{z}\) can then be mapped in this feature space \(F\) (and centred by subtracting the \(\varPhi \)-mean of mapped data) resulting in the zero-mean vectors \(\varPhi ({\varvec{z}})\). The objective is to solve the eigenvalue problem described by

$$\begin{aligned} \varvec{C}\cdot \varvec{V}=\lambda \varvec{V} \end{aligned}$$
(4a)

with \(\lambda \) the eigenvalues and \(\varvec{V}\) the eigenvectors of the \(N_\mathrm{F}\times N_\mathrm{F}\) covariance matrix \(\varvec{C}\) defined in the feature space \(F\) as

$$\begin{aligned} \varvec{C}=\frac{1}{n_0 }\sum \limits _{i=1}^{n_0 } {\varPhi ( {\varvec{z}_i })\cdot \varPhi ( {\varvec{z}_i})^\mathrm{T}} \end{aligned}$$
(4b)

The eigenvectors \(\varvec{V}\) and eigenvalues \(\lambda \), respectively, correspond to the principal components (denoted PCs) and the weights, which are used to decompose the spatial variable in the form of Eq. 3.

Following Schölkopf et al. (1998), it can be shown that there exist some coefficients \(\alpha _{ij}\) so that the ith eigenvectors \(\varvec{V}_{\!i}\) can be expanded as

$$\begin{aligned} V_i =\sum \limits _{j=1}^{n_0 } {\alpha _{ij} \varPhi ( {{\varvec{z}}_j})} \end{aligned}$$
(4c)

To overcome the difficulties of computing \(\varvec{C}\) when \(N_\mathrm{F}\) is high, Eq. 4a can be formulated as an equivalent kernel eigenvalue problem as

$$\begin{aligned} n_0 \lambda _i \varvec{\alpha } ={\varvec{K}}\cdot \varvec{\alpha } , \text{ i }=1,\ldots ,n_0 \end{aligned}$$
(4d)

The \(n_{0}\times n_{0}\) matrix \(\varvec{K}\) corresponds to the kernel matrix, whose elements \(K_{ij}\) read as the dot product of vectors in \(F,\) namely\(K_{ij} =({\varPhi ( {{\varvec{z}}_i })\cdot \varPhi ( {{\varvec{z}}_j })})\). The eigenvalues \(\lambda \) and eigenvectors \(\varvec{V}\) can then be derived from the eigenvalue decomposition of \(\varvec{K}\) resulting in \(n_{0}.\lambda \) and \(\varvec{\alpha }\). The efficiency of the kernel formulation of the eigenvalue problem resides in the calculation of \(\varvec{K}\), which does not require the explicit calculation of the mapping \(\varPhi (\varvec{z})\) of very high dimension \(N_\mathrm{F}\), but only the dot product of vectors in \(F \)using kernel functions \(k\), which can be of various kinds (for instance polynomial or Gaussian). This corresponds to the so-called kernel trick. Note that KPCA using a kernel function of type polynomial of first order (linear function) is equivalent to the traditional PCA. In this sense, it can be viewed as a generalization of linear PCA.

A popular kernel is of type Gaussian and holds as

$$\begin{aligned} K_{ij} =k( {\varvec{z}_i ,\varvec{z}_j })=\exp ( {-\left\| {\varvec{z}_i - \varvec{z}_j } \right\| ^2/2\sigma ^{2}}) \end{aligned}$$
(5)

with \(\left\| {\varvec{z}_i - \varvec{z}_j } \right\| ^{2}\) the squared Euclidean distance, and \(\sigma >0\) the kernel width controlling the flexibility of the kernel, so that a larger \(\sigma \) allows more mixing between elements of the realizations, whereas lower value implies using only a few significant realizations. There exist several robust methods to choose its value. The present study relies on the procedure of Rathi et al. (2006), which defines it as the average minimum distance between two realizations in the input space, which yields \(\sigma \sim 150\) in the case described in Sect. 2.3.

Not only reducing the high-dimensionality of the spatial variable and its expansion in \(F\) is desired, but also the reverse procedure aiming at obtaining realizations back in the physical input space. While the decomposition does not require the explicit formulation of the mapping \(\varPhi \), the reverse procedure is more complicated (problem known as the pre-image problem), because the pre-image (the inverse mapping) may not exist or even if it exists, it may not be unique and the unknown mapping is non-linear. Solving this problem implies addressing a non-linear optimization using for instance gradient-based optimization techniques of Mika et al. (1999).

Appendix B: Gradient Boosting Algorithm

This appendix describes the boosting algorithm to find the optimal prediction function \(f^{*}\) defined by

$$\begin{aligned} f^*=\arg \;\min \nolimits _f \text{ E }_{Y,\varvec{X}} \left[ {L( {y,f(\varvec{x})})} \right] \end{aligned}$$
(6)

In practice, the expected value is not known and boosting algorithms minimize the observed mean derived from the training data (also known as empirical risk \(R)\) as

$$\begin{aligned} R=\sum \limits _{i=1}^n {L( {y_i ,f( {\varvec{x}_i })})} \end{aligned}$$
(7)

The following iterative algorithm (known as component-wise gradient boosting) is used to minimize \(R\) over \(f. \) Note that in the sequel, the vector of length \(n\) of function estimates at iteration \(m\) is denoted \(f_{\left[ m \right] }^{*}\).

  1. 1.

    Initialize the function estimate \(f^{*}_{[0]}\) with offset values;

  2. 2.

    Specify a set of base-learners. Base-learners are simple regression estimators with a fixed set of covariate and a uni-variate response. The simplest case corresponds to the linear model for each covariate. More sophisticated base-learners can be chosen like splines (univariate or multi-variate, Bühlmann and Hothorn 2007) or random forest (Friedman 2001). Each base-learner represents a modelling alternative for the statistical model. Denote the number of base-learners by \(P\) and set \(m\)=0.

  3. 3.

    Increase \(m\) by 1.

  4. 4.
    1. (a)

      Compute the negative gradient \(-\partial L/\partial f\) of the loss function and evaluate it at \(f_{\left[ {m-1} \right] }^*( {\varvec{x}_i }),\text{ i }=1,\ldots ,n \) (at the estimate of the previous iteration). This yields the negative gradient vector defined as

      $$\begin{aligned} \varvec{u}_{\left[ m \right] } =\left( {-\frac{\partial L( {y_i ,f_{\left[ {m-1} \right] }^*( {\varvec{x}_i })})}{\partial f}}\right) ,\;\;\;\;i=1,\ldots ,n \end{aligned}$$
      (8)
    2. (b)

      Fit each of the \(P\) base-learners separately to the negative gradient vector (i.e. the \(P\) regression estimators specified in step 2). This yields \(P\) vectors of predicted values, where each vector is an estimate of the negative gradient vector \(\varvec{u}_{[m]}\);

    3. (c)

      Select the base-learner, which fits \(\varvec{u}_{[m]}\) the best according to the residual sum of squares criterion (i.e. the sum of the squared-errors) and set \(\varvec{u}_{\left[ m \right] }^*\) equal to the fitted values of the best-fitting base-learner;

    4. (d)

      Update the current estimate:\(f_{\left[ m \right] }^{*} =f_{\left[ {m-1} \right] }^{*} +v\cdot \varvec{u}_{\left[ m \right] }^{*}\), where \(v\) is a real-valued step length factor ranging from 0 to 1;

  5. 5.

    Iterate Steps 3 and 4 until the stopping iteration \(m_\mathrm{stop}\) is reached (see below for its choice).

By construction (step 4), an estimate of the true negative gradient of \(R\) is added to the current estimate of \(f^{*}\) in each iteration \(m\) so that the component-wise boosting algorithm descends along the gradient of \(R\). Hence, \(R\) is minimized in a stage-wise manner, and a (regression) relationship between \(y\) and \(\varvec{x}\) is established. At steps 4(c) and 4(d), the algorithm performs variable selection (and model choice), since only one base-learner is selected for updating \(f_{\left[ m \right] }^{*}\) in each iteration.

The choice of the stopping iteration \(m_\mathrm{stop}\) is of primary importance to prevent over-fitting. Bühlmann and Hothorn (2007) advocates the use of an early stopping strategy using a finite stopping iteration algorithm that optimizes the prediction accuracy so that the stopping iteration \(m_\mathrm{stop}\) becomes a tuning parameter of the algorithm, and cross-validation techniques or AIC-based (Akaike Information Criterion) techniques can be used to estimate it. The choice of the step length factor \(v \) has been shown to have minor influence on the predictive performance of a boosting algorithm and the only requirement is that the value of \(v\) remains small (e.g., \(v= 0.1\)).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rohmer, J. Boosting Kernel-Based Dimension Reduction for Jointly Propagating Spatial Variability and Parameter Uncertainty in Long-Running Flow Simulators. Math Geosci 47, 227–246 (2015). https://doi.org/10.1007/s11004-014-9551-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11004-014-9551-0

Keywords

Navigation