On the Construction of Uncertain Time Series Surrogates Using Polynomial Chaos and Gaussian Processes

Sochala, Pierre; Iskandarani, Mohamed

doi:10.1007/s11004-019-09806-8

On the Construction of Uncertain Time Series Surrogates Using Polynomial Chaos and Gaussian Processes

Published: 20 May 2019

Volume 52, pages 285–309, (2020)
Cite this article

Mathematical Geosciences Aims and scope Submit manuscript

Pierre Sochala¹ &
Mohamed Iskandarani²

Abstract

The analysis of time series is a fundamental task in many flow simulations such as oceanic and atmospheric flows. A major challenge is the design of a faithful and accurate time-dependent surrogate built with a tractable sample set and a manageable number of degrees of freedom. Several techniques are implemented to handle the time-dependent aspect of the quantity of interest including uncoupled approaches, low-rank approximations, auto-regressive models and global Bayesian emulators. These approaches rely on two popular methods for uncertainty quantification: polynomial chaos and Gaussian process regression. The different techniques are tested and compared on the uncertain evolution of the sea surface height forecast at two locations exhibiting contrasting levels of variance. Two ensemble sizes are considered as well as two versions of polynomial chaos (ordinary least squares or ridge regression) and Gaussian processes (squared exponential or Matérn covariance function) in order to assess their impact on the results. The conclusions focus on the advantages and the drawbacks, in terms of accuracy, flexibility and computational costs of the different techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Assimilation for Chaotic Dynamics

Comparison of polynomial chaos and Gaussian process surrogates for uncertainty quantification and correlation estimation of spatially distributed open-channel steady flows

Article 23 October 2017

Bayesian inference of earthquake parameters from buoy data using a polynomial chaos-based surrogate

Article 07 April 2017

References

Alemazkoor N, Meidani H (2017) Divide and conquer: an incremental sparsity promoting compressive sampling approach for polynomial chaos expansions. Comput Methods Appl Mech Eng 318:937–956
Article Google Scholar
Alexanderian A, Le Maître O, Najm H, Iskandarani M, Knio O (2012) Multiscale stochastic preconditioners in non-intrusive spectral projection. SIAM J Sci Comput 50(2):306–340
Article Google Scholar
Blatman G, Sudret B (2011) Adaptive sparse polynomial chaos expansion based on least angle regression. J Comput Phys 230(6):2345–2367
Article Google Scholar
Bleck R (2002) An oceanic general circulation model framed in hybrid isopycnic-cartesian coordinates. Ocean Model 4(1):55–88
Article Google Scholar
Bowman A, Azzalini A (1997) Applied smoothing techniques for data analysis. Oxford University Press, New York
Google Scholar
Cameron RH, Martin WT (1947) The orthogonal development of nonlinear functionals in series of Fourier–Hermite functionals. Ann Math 48:385–392
Article Google Scholar
Chassignet E, Smith L, Halliwell G, Bleck R (2003) North Atlantic simulation with the hybrid coordinate ocean model (HYCOM): impact of the vertical coordinate choice, reference density, and themobaricity. J Phys Oceanogr 33:2504–2526
Article Google Scholar
Conrad PR, Marzouk YM (2013) Adaptive smolyak pseudospectral approximations. SIAM J Sci Comput 35(6):A2643–A2670
Article Google Scholar
Conti S, O’Hagan A (2010) Bayesian emulation of complex multi-output and dynamic computer models. J Stat Plan Infer 140(3):640–651
Article Google Scholar
Doostan A, Owhadi H (2011) A non-adapted sparse approximation of pdes with stochastic inputs. J Comput Phys 230(8):3015–3034
Article Google Scholar
Ernst OG, Mugler A, Starkloff HJ, Ullmann E (2012) On the convergence of generalized polynomial chaos expansions. ESAIM Math Model Numer Anal 46:317–339
Article Google Scholar
Gerritsma M, van der Steen JB, Vos P, Karniadakis G (2010) Time-dependent generalized polynomial chaos. J Comput Phys 229(22):8333–8363
Article Google Scholar
Ghanem RG, Spanos SD (1991) Stochastic Finite Elements: a Spectral Approach. Springer, Berlin
Book Google Scholar
Gibbs M N (1997) Bayesian Gaussian processes for regression and classification. Ph.D. thesis, Department of Physics, University of Cambridge
Greengard L, Rokhlin V (1987) A fast algorithm for particle simulations. J Comput Phys 73(2):325–348
Article Google Scholar
Iskandarani M, Le Hénaff M, Thacker WC, Srinivasan A, Knio OM (2016a) Quantifying uncertainty in gulf of mexico forecasts stemming from uncertain initial conditions. J Geophys Res Oceans 121(7):4819–4832
Article Google Scholar
Iskandarani M, Wang S, Srinivasan A, Thacker WC, Winokur J, Knio O (2016b) An overview of uncertainty quantification techniques with application to oceanic and oil-spill simulations. J Geophys Res Oceans 121(4):2789–2808
Article Google Scholar
Kocijan J, Girard A, Banko B, Murray-Smith R (2005) Dynamic systems identification with gaussian processes. Math Comput Model Dyn Syst 11(4):411–424
Article Google Scholar
Le Gratiet L, Marelli S, Sudret B (2016) Metamodel-based sensitivity analysis: polynomial chaos expansions and Gaussian processes. Springer, Cham, pp 1–37 ISBN 978-3-319-11259-6
Google Scholar
Le Maître O, Najm H, Ghanem R, Knio O (2004) Multi-resolution analysis of Wiener-type uncertainty propagation schemes. J Comput Phys 197(2):502–531
Article Google Scholar
Le Maître OP, Knio OM (2010) Spectral methods for uncertainty quantification. Springer, Berlin
Book Google Scholar
Li G, Iskandarani M, Le Hénaff M, Winokur J, Le Maître OP, Knio OM (2016) Quantifying initial and wind forcing uncertainties in the gulf of mexico. Comput Geosci 20(5):1133–1153
Article Google Scholar
Lorenz EN (1956) Empirical orthogonal functions and statistical weather prediction. Scientific report / MIT, Statistical Forecasting Project, Massachusetts Institute of Technology, Department of Meteorology
Mai CV, Spiridonakos MD, Chatzi EN, Sudret B (2016) Surrogate modeling for stochastic dynamical systems by combining nonlinear autoregressive with exogeneous input models and polynomial chaos expansions. Int J Uncertain Quant 6(4):313–339
Article Google Scholar
Matheron G (1973) The intrinsic random functions and their applications. Adv Appl Probab 5(3):439–468
Article Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book Google Scholar
Morokoff WJ, Caflisch RE (1995) Quasi-Monte Carlo integration. J Comput Phys 122(2):218–230
Article Google Scholar
Neal RM (1996) Bayesian learning for neural networks. Springer, Berlin ISBN 0387947248
Book Google Scholar
Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, Berlin
Google Scholar
Owen NE, Challenor P, Menon PP, Bennani S (2017) Comparison of surrogate-based uncertainty quantification methods for computationally expensive simulators. SIAM/ASA J Uncertain Quant 5(1):403–435
Article Google Scholar
Pronzato L, Müller WG (2012) Design of computer experiments: space filling and beyond. Stat Comput 22(3):681–701
Article Google Scholar
Rasmussen CE, Williams CKI (2005) Gaussian processes for machine learning. The MIT Press, Cambridge ISBN 026218253X
Book Google Scholar
Roy PT, Moçayd NE, Ricci S, Jouhaud JC, Goutal N, De Loco M, Rochoux MC (2017) Comparison of polynomial chaos and gaussian process surrogates for uncertainty quantification and correlation estimation of spatially distributed open-channel steady flows. Stoch Env Res Risk A ISSN 1436–3259
Sampson PD, Guttorp P (1992) Nonparametric estimation of nonstationary spatial covariance structure. J Am Stat Assoc 87(417):108–119
Article Google Scholar
Santner TJ, Williams B, Notz W (2003) The design and analysis of computer experiments. Springer, Berlin
Book Google Scholar
Seber GAF, Lee AJ (2003) Linear regression analysis. Wiley, New York ISBN 9780471722199
Book Google Scholar
Smolyak S (1963) Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl Akad Nauk SSSR 4(240–243):123
Google Scholar
Sochala P, De Martin F (2017) Surrogate combining harmonic decomposition and polynomial chaos for seismic shear waves in uncertain media. Comput Geosci 22(1):125–144
Article Google Scholar
Spiridonakos MD, Chatzi EN (2015) Metamodeling of dynamic nonlinear structural systems through polynomial chaos NARX models. Comput Struct 157:99–113
Article Google Scholar
Tikhonov AN, Arsenin VIA (1977) Solutions of ill-posed problems. Scripta series in mathematics, Winston ISBN 9780470991244
Wan X, Karniadakis G (2006) Multi-element generalized polynomial chaos for arbitrary probability measures. SIAM J Sci Comput 28(3):901–928
Article Google Scholar
Wang S, Li G, Iskandarani M, Le Hénaff M, Knio OM (2018) Verifying and assessing the performance of the perturbation strategy in polynomial chaos ensemble forecasts of the circulation in the gulf of mexico. Ocean Model Rev 131:59–70
Article Google Scholar
Winokur J, Conrad P, Sraj I, Knio O, Srinivasan A, Thacker WC, Marzouk Y, Iskandarani M (2013) A priori testing of sparse adaptive polynomial chaos expansions using an ocean general circulation model database. Comput Geosci 17(6):899–911
Article Google Scholar

Download references

Acknowledgements

The work of P. Sochala is supported by a funding of BRGM (French Geological Survey) through its Institut Carnot sponsored by the ANR (French National Research Agency). This research was made possible in part by a grant from The Gulf of Mexico Research Initiative, and in part by NASA-NNX13AE30G and NSF1639722. Data are publicly available through the Gulf of Mexico Research Initiative Information & Data Cooperative (GRIIDC) at https://data.gulfresearchinitiative.org (https://doi.org/10.7266/n7-d8ga-6c22). The authors are greatful to S. Wang for having performed HYCOM simulations and to O. Le Maître for fruitful discussions about the clustering approach in the high-variance case.

Author information

Authors and Affiliations

BRGM, 3 avenue Claude Guillemin, 45060, Orléans, France
Pierre Sochala
Rosenstiel School of Marine and Atmospheric Science, University of Miami, Miami, FL, 33149, USA
Mohamed Iskandarani

Authors

Pierre Sochala
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Iskandarani
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Sochala.

Appendix A: Cross-Validation Technique

Cross-validation (CV) is a popular technique used in statistics and machine learning to assess the quality of the predictive capacity of a model. The principle of CV is to partition the data into two complementary subsets, then to build the model on one subset (the training one), and finally to test the model on the other subset (the validation one). This procedure is repeated several times with different partitioning of the data. In the leave-one-out version of CV, the predicted residual $e_{[i]}$ at $\varvec{\xi }^{(i)}$ is defined as

$$\begin{aligned} e_{[i]} = u(\varvec{\xi }^{(i)}) - \tilde{u}_{[i]}(\varvec{\xi }^{(i)}), \end{aligned}$$

where $u(\varvec{\xi }^{(i)})$ is the true value, and $\tilde{u}_{[i]}(\varvec{\xi }^{(i)})$ denotes the predicted value of the model $\tilde{u}_{[i]}$ built by removing the training point $\varvec{\xi }^{(i)}$ in the training set. The leave-one-out error $E_{\mathrm{loo}}$, a.k.a predicted residual error sum of squares, is estimated by an empirical mean square of the predicted residual,

$$\begin{aligned} E_{\mathrm{loo}} = \frac{1}{N}\sum _{i=1}^N e_{[i]}^2. \end{aligned}$$

In the general case, CV can be an expensive technique due to the construction of the N models $\tilde{u}_{[i]}$. However, a fast computation of $E_{\mathrm{loo}}$ is possible in linear regression models (Seber and Lee 2003) by using the relation

$$\begin{aligned} e_{[i]} = \frac{u(\varvec{\xi }^{(i)}) - \tilde{u}(\varvec{\xi }^{(i)})}{1-h_i}, \end{aligned}$$

where $\tilde{u}(\varvec{\xi }^{(i)})$ is the single model built with all the training points, and $h_i$ is the i-th diagonal term of the hat matrix $H=P(P^{\top }P)^{-1}P^{\top }$ with P the design matrix of the linear regression. In practice, the vector $\varvec{e}_{[]}$ of the predicted residual can be directly computed from the model outputs as follows

where $I=[\delta _{ij}]\in \mathbb {R}^{N,N}$ is the identity matrix, $\oslash $ denotes the component-wise division, and $\varvec{1}=[1]^{\top }\in {\mathbb {R}^N}$ and $\mathrm{diag}(H)$ represent the diagonal part of H.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sochala, P., Iskandarani, M. On the Construction of Uncertain Time Series Surrogates Using Polynomial Chaos and Gaussian Processes. Math Geosci 52, 285–309 (2020). https://doi.org/10.1007/s11004-019-09806-8

Download citation

Received: 14 December 2018
Accepted: 22 April 2019
Published: 20 May 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11004-019-09806-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Construction of Uncertain Time Series Surrogates Using Polynomial Chaos and Gaussian Processes

Abstract

Access this article