# Boosting Kernel-Based Dimension Reduction for Jointly Propagating Spatial Variability and Parameter Uncertainty in Long-Running Flow Simulators

- 350 Downloads
- 1 Citations

## Abstract

Assessing the impact of multiple sources of uncertainty in flow models requires simulating the response associated with a large set of stochastic realizations (\(>\!\!1,\!000\)) of both the spatial variable (e.g. permeability field) and of the large number of model parameters (e.g., of the relative permeability curve). Yet, this Monte Carlo approach may be hindered by the high computation time cost of long-running flow simulators (CPU time \(> \)several hours). A possible strategy can then rely on the combination of meta-modelling techniques and of basis set expansion like kernel principal component analysis to reduce the high-dimensionality (\(>\!\!10,\!000\) grid cells) of the spatial variable. Using a synthetic heterogeneous channelized reservoir, it is shown that \(\sim \)50 principal components PCs are still necessary to retain \(\sim \)80 % of the total variability: this might pose difficulties for the construction of the meta-model by imposing a large set of training samples. This computational limitation can be overcome by improving the meta-modelling phase through boosting techniques, which allow selecting during the fitting process the input variables (PCs or model parameters), which are the most informative regarding the prediction accuracy, and allow in this manner reducing the number of necessary model runs. The procedure is applied to a synthetic flow model of a CO\(_{2}\) geological storage with low CPU time. By comparing the P10, P50 and P90 quantiles of the pressure build-up estimated using the true simulator (1,500 simulations) with the ones estimated using the meta-model. The joint impact of spatial variability and of 12 uncertain parameters is shown to be accurately captured using a number of training samples of the order of the number of PCs (\(\sim \)a few tens). The level of performance proved to be as high as the one of the distance-based alternative proposed in the literature to solve the problem of joint propagation.

## Keywords

Computationally intensive flow model Spatially dependent input High-dimension Basis set expansion Variable selection Component-wise gradient boosting## Notes

### Acknowledgments

The research leading to these results has been carried out in the framework of the ULTIMATE-CO2 Project, funded by the European Commission’s Seventh Framework Program [FP7/2007-2013] under grant agreement n\(^{\circ }\) 281196. We thank the anonymous reviewer for their comments, which led to significant improvements to this article.

## References

- Arnold D, Demyanov V, Tatum D, Christie MA, Rojas TS, Geiger-Boschung S, Corbett PWM (2013) Hierarchical benchmark case study for history matching, uncertainty quantification and reservoir characterization. Comput Geosci 50:4–15CrossRefGoogle Scholar
- Bazargan H, Christie MA, Tchelepi H (2013) Efficient Markov chain Monte Carlo sampling using polynomial Chaos expansion. In: SPE reservoir simulation symposium. The Woodlands, Texas. SPE 163663Google Scholar
- Birkholzer JT, Zhou Q, Tsang C-F (2009) Large-scale impact of CO\(_{2}\) storage in deep saline aquifers: a sensitivity study on pressure response in stratified systems. Int J Greenhouse Gas Control 3(2):181–194CrossRefGoogle Scholar
- Bouc O, Bellenfant G, Dubois D, Guyonnet D, Rohmer J, Wertz F, Gastine M, Jacquemet N, Vong CQ, Grataloup S, Picot-Colbeaux G, Fabriol H (2011) Safety criteria for CO\(_2\) geological storage: determination workflow and application in the context of the Paris Basin. Energy Procedia 4:4020–4027CrossRefGoogle Scholar
- Bouquet S, de Fouquet C, Bruel D (2013) Optimization of CO\(_2\) storage assessment using selection of stochastic realisations. In: 7th international conference on sensitivity analysis of model output, 1–4 July 2013, Nice, FranceGoogle Scholar
- Bühlmann P (2006) Boosting for high-dimensional linear models. Ann Stat 34:559–583CrossRefGoogle Scholar
- Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat Sci 22:477–522CrossRefGoogle Scholar
- Busby D, Romary T, Touzani S, Feraille M, Noetinger B, Hu LY (2007) Reservoir forecasting under uncertainty: an integrated approach. In: International meeting on complexity in oil industry, 5–9 August 2007, Natal, BrazilGoogle Scholar
- Caers J, Zhang T (2004) Multiple-point geostatistics: a quantitative vehicle for integrating geologic analogs into multiple reservoir models. AAPG Mem 80:383–394Google Scholar
- Caers J, Scheidt C (2011) Integration of engineering and geological uncertainty for reservoir performance prediction using a distance-based approach. AAPG Memoir 96:191–202Google Scholar
- Fetel E, Caumon G (2008) Reservoir flow uncertainty assessment using response surface constrained by secondary information. J Petrol Sci Eng 60:170–182CrossRefGoogle Scholar
- Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232CrossRefGoogle Scholar
- Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New YorkCrossRefGoogle Scholar
- Hemez FM, Atamturktur S (2011) The dangers of sparse sampling for the quantification of margin and uncertainty. Reliab Eng Syst Saf 96(9):1220–1231. doi: 10.1016/j.ress.2011.02.015 CrossRefGoogle Scholar
- Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2010) Model-based boosting 2.0. J Mach Learn Res 11:2109–2113Google Scholar
- Hou Z, Engel DW, Lin G, Fang Y, Fang Z (2013) An uncertainty quantification framework for studying the effect of spatial heterogeneity in reservoir permeability on CO\(_{2 }\) sequestration. Math Geosci 45(7):799–817. doi: 10.1007/s11004-013-9459-0
- Issautier B, Viseur S, Audigane P, le Nindre Y-M (2014) Impacts of fluvial reservoir heterogeneity on connectivity: implications in estimating geological storage capacity for CO2. Int J Greenhouse Gas Control 20:333–349Google Scholar
- Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, BerlinGoogle Scholar
- Koehler JR, Owen AB (1996) Computer experiment. In: Ghosh S, Rao CR (eds) Handbook of statistics. Elsevier Science, New York, pp 261–308Google Scholar
- Ma X, Zabaras N (2011) Kernel principal component analysis for stochastic input model generation. J Comput Phys 230:7311–7331Google Scholar
- McKay MD, Beckman RJ, Conover WJ (1979) A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21:239–245Google Scholar
- Mika S, Schölkopf B, Smola A, Muller KR, Scholz M, Ratsch G (1999) Kernel PCA and denoising in feature spaces. Adv Neural Inf Process Syst, vol 11. MIT Press, Massachusetts, pp 536–542Google Scholar
- Norden B, Frykman P (2013) Geological modelling of the Triassic Stuttgart formation at the Ketzin CO\(_{2}\) storage site, Germany. Int J Greenhouse Gas Control 19:756–774. http://dx.doi.org/10.1016/j.ijggc.2013.04.019
- Pappenberger F, Beven KJ (2006) Ignorance is bliss: or seven reasons not to use uncertainty analysis. Water Resour Res 42(W05302):2005W. doi: 10.1029/R004820 Google Scholar
- Pruess K (2005) ECO2N: a TOUGH2 fluid property module for mixtures of water, NaCl, and CO2. Report LBNL-57952. Lawrence Berkeley National Laboratory, BerkeleyGoogle Scholar
- R Development Core Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0. http://www.R-project.org/
- Rathi Y, Dambreville S, Tannenbaum A (2006) Statistical shape analysis using kernel PCA. In: Image processing: algorithms and systems, neural networks, and machine learning, SPIE, p 60641BGoogle Scholar
- Rohmer J (2014) Dynamic sensitivity analysis of long-running landslide models through basis set expansion and meta-modelling. Nat Hazards 73(1):5–22. doi: 10.1007/s11069-012-0536-3
- Sarma P, Durlofsky LJ, Aziz K (2008) Kernel principal component analysis for efficient, differentiable parameterization of multipoint geostatistics. Math Geosci 40:3–32. doi: 10.1007/s11004-007-9131-7 CrossRefGoogle Scholar
- Scheidt C, Caers J (2008) Representing spatial uncertainty using distances and kernels. Math Geosci 41(4):397–419. doi: 10.1007/s11004-008-9186-0 CrossRefGoogle Scholar
- Schölkopf B, Smola A, Muller K-R (1998) Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput 10:1299–1319CrossRefGoogle Scholar
- Strebelle S (2002) Conditional simulation of complex geological structures using multiple-point statistics. Math Geol 34:1–21CrossRefGoogle Scholar
- Sun AY, Zeidouni M, Nicot J-P, Lu Z, Zhang D (2013) Assessing leakage detectability at geologic CO2 sequestration sites using the probabilistic collocation method. Adv Water Res 56:49–60. doi: 10.1016/j.advwatres.2012.11.017 CrossRefGoogle Scholar
- Zhang D, Lu Z (2004) An efficient, high-order perturbation approach for flow in random porous media via Karhunen–Loève and polynomial expansions. J Comput Phys 194:773–794CrossRefGoogle Scholar