Abstract
The analysis of time series is a fundamental task in many flow simulations such as oceanic and atmospheric flows. A major challenge is the design of a faithful and accurate time-dependent surrogate built with a tractable sample set and a manageable number of degrees of freedom. Several techniques are implemented to handle the time-dependent aspect of the quantity of interest including uncoupled approaches, low-rank approximations, auto-regressive models and global Bayesian emulators. These approaches rely on two popular methods for uncertainty quantification: polynomial chaos and Gaussian process regression. The different techniques are tested and compared on the uncertain evolution of the sea surface height forecast at two locations exhibiting contrasting levels of variance. Two ensemble sizes are considered as well as two versions of polynomial chaos (ordinary least squares or ridge regression) and Gaussian processes (squared exponential or Matérn covariance function) in order to assess their impact on the results. The conclusions focus on the advantages and the drawbacks, in terms of accuracy, flexibility and computational costs of the different techniques.
Similar content being viewed by others
References
Alemazkoor N, Meidani H (2017) Divide and conquer: an incremental sparsity promoting compressive sampling approach for polynomial chaos expansions. Comput Methods Appl Mech Eng 318:937–956
Alexanderian A, Le Maître O, Najm H, Iskandarani M, Knio O (2012) Multiscale stochastic preconditioners in non-intrusive spectral projection. SIAM J Sci Comput 50(2):306–340
Blatman G, Sudret B (2011) Adaptive sparse polynomial chaos expansion based on least angle regression. J Comput Phys 230(6):2345–2367
Bleck R (2002) An oceanic general circulation model framed in hybrid isopycnic-cartesian coordinates. Ocean Model 4(1):55–88
Bowman A, Azzalini A (1997) Applied smoothing techniques for data analysis. Oxford University Press, New York
Cameron RH, Martin WT (1947) The orthogonal development of nonlinear functionals in series of Fourier–Hermite functionals. Ann Math 48:385–392
Chassignet E, Smith L, Halliwell G, Bleck R (2003) North Atlantic simulation with the hybrid coordinate ocean model (HYCOM): impact of the vertical coordinate choice, reference density, and themobaricity. J Phys Oceanogr 33:2504–2526
Conrad PR, Marzouk YM (2013) Adaptive smolyak pseudospectral approximations. SIAM J Sci Comput 35(6):A2643–A2670
Conti S, O’Hagan A (2010) Bayesian emulation of complex multi-output and dynamic computer models. J Stat Plan Infer 140(3):640–651
Doostan A, Owhadi H (2011) A non-adapted sparse approximation of pdes with stochastic inputs. J Comput Phys 230(8):3015–3034
Ernst OG, Mugler A, Starkloff HJ, Ullmann E (2012) On the convergence of generalized polynomial chaos expansions. ESAIM Math Model Numer Anal 46:317–339
Gerritsma M, van der Steen JB, Vos P, Karniadakis G (2010) Time-dependent generalized polynomial chaos. J Comput Phys 229(22):8333–8363
Ghanem RG, Spanos SD (1991) Stochastic Finite Elements: a Spectral Approach. Springer, Berlin
Gibbs M N (1997) Bayesian Gaussian processes for regression and classification. Ph.D. thesis, Department of Physics, University of Cambridge
Greengard L, Rokhlin V (1987) A fast algorithm for particle simulations. J Comput Phys 73(2):325–348
Iskandarani M, Le Hénaff M, Thacker WC, Srinivasan A, Knio OM (2016a) Quantifying uncertainty in gulf of mexico forecasts stemming from uncertain initial conditions. J Geophys Res Oceans 121(7):4819–4832
Iskandarani M, Wang S, Srinivasan A, Thacker WC, Winokur J, Knio O (2016b) An overview of uncertainty quantification techniques with application to oceanic and oil-spill simulations. J Geophys Res Oceans 121(4):2789–2808
Kocijan J, Girard A, Banko B, Murray-Smith R (2005) Dynamic systems identification with gaussian processes. Math Comput Model Dyn Syst 11(4):411–424
Le Gratiet L, Marelli S, Sudret B (2016) Metamodel-based sensitivity analysis: polynomial chaos expansions and Gaussian processes. Springer, Cham, pp 1–37 ISBN 978-3-319-11259-6
Le Maître O, Najm H, Ghanem R, Knio O (2004) Multi-resolution analysis of Wiener-type uncertainty propagation schemes. J Comput Phys 197(2):502–531
Le Maître OP, Knio OM (2010) Spectral methods for uncertainty quantification. Springer, Berlin
Li G, Iskandarani M, Le Hénaff M, Winokur J, Le Maître OP, Knio OM (2016) Quantifying initial and wind forcing uncertainties in the gulf of mexico. Comput Geosci 20(5):1133–1153
Lorenz EN (1956) Empirical orthogonal functions and statistical weather prediction. Scientific report / MIT, Statistical Forecasting Project, Massachusetts Institute of Technology, Department of Meteorology
Mai CV, Spiridonakos MD, Chatzi EN, Sudret B (2016) Surrogate modeling for stochastic dynamical systems by combining nonlinear autoregressive with exogeneous input models and polynomial chaos expansions. Int J Uncertain Quant 6(4):313–339
Matheron G (1973) The intrinsic random functions and their applications. Adv Appl Probab 5(3):439–468
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Morokoff WJ, Caflisch RE (1995) Quasi-Monte Carlo integration. J Comput Phys 122(2):218–230
Neal RM (1996) Bayesian learning for neural networks. Springer, Berlin ISBN 0387947248
Nocedal J, Wright SJ (2006) Numerical optimization, 2nd edn. Springer, Berlin
Owen NE, Challenor P, Menon PP, Bennani S (2017) Comparison of surrogate-based uncertainty quantification methods for computationally expensive simulators. SIAM/ASA J Uncertain Quant 5(1):403–435
Pronzato L, Müller WG (2012) Design of computer experiments: space filling and beyond. Stat Comput 22(3):681–701
Rasmussen CE, Williams CKI (2005) Gaussian processes for machine learning. The MIT Press, Cambridge ISBN 026218253X
Roy PT, Moçayd NE, Ricci S, Jouhaud JC, Goutal N, De Loco M, Rochoux MC (2017) Comparison of polynomial chaos and gaussian process surrogates for uncertainty quantification and correlation estimation of spatially distributed open-channel steady flows. Stoch Env Res Risk A ISSN 1436–3259
Sampson PD, Guttorp P (1992) Nonparametric estimation of nonstationary spatial covariance structure. J Am Stat Assoc 87(417):108–119
Santner TJ, Williams B, Notz W (2003) The design and analysis of computer experiments. Springer, Berlin
Seber GAF, Lee AJ (2003) Linear regression analysis. Wiley, New York ISBN 9780471722199
Smolyak S (1963) Quadrature and interpolation formulas for tensor products of certain classes of functions. Dokl Akad Nauk SSSR 4(240–243):123
Sochala P, De Martin F (2017) Surrogate combining harmonic decomposition and polynomial chaos for seismic shear waves in uncertain media. Comput Geosci 22(1):125–144
Spiridonakos MD, Chatzi EN (2015) Metamodeling of dynamic nonlinear structural systems through polynomial chaos NARX models. Comput Struct 157:99–113
Tikhonov AN, Arsenin VIA (1977) Solutions of ill-posed problems. Scripta series in mathematics, Winston ISBN 9780470991244
Wan X, Karniadakis G (2006) Multi-element generalized polynomial chaos for arbitrary probability measures. SIAM J Sci Comput 28(3):901–928
Wang S, Li G, Iskandarani M, Le Hénaff M, Knio OM (2018) Verifying and assessing the performance of the perturbation strategy in polynomial chaos ensemble forecasts of the circulation in the gulf of mexico. Ocean Model Rev 131:59–70
Winokur J, Conrad P, Sraj I, Knio O, Srinivasan A, Thacker WC, Marzouk Y, Iskandarani M (2013) A priori testing of sparse adaptive polynomial chaos expansions using an ocean general circulation model database. Comput Geosci 17(6):899–911
Acknowledgements
The work of P. Sochala is supported by a funding of BRGM (French Geological Survey) through its Institut Carnot sponsored by the ANR (French National Research Agency). This research was made possible in part by a grant from The Gulf of Mexico Research Initiative, and in part by NASA-NNX13AE30G and NSF1639722. Data are publicly available through the Gulf of Mexico Research Initiative Information & Data Cooperative (GRIIDC) at https://data.gulfresearchinitiative.org (https://doi.org/10.7266/n7-d8ga-6c22). The authors are greatful to S. Wang for having performed HYCOM simulations and to O. Le Maître for fruitful discussions about the clustering approach in the high-variance case.
Author information
Authors and Affiliations
Corresponding author
Appendix A: Cross-Validation Technique
Appendix A: Cross-Validation Technique
Cross-validation (CV) is a popular technique used in statistics and machine learning to assess the quality of the predictive capacity of a model. The principle of CV is to partition the data into two complementary subsets, then to build the model on one subset (the training one), and finally to test the model on the other subset (the validation one). This procedure is repeated several times with different partitioning of the data. In the leave-one-out version of CV, the predicted residual \(e_{[i]}\) at \(\varvec{\xi }^{(i)}\) is defined as
where \(u(\varvec{\xi }^{(i)})\) is the true value, and \(\tilde{u}_{[i]}(\varvec{\xi }^{(i)})\) denotes the predicted value of the model \(\tilde{u}_{[i]}\) built by removing the training point \(\varvec{\xi }^{(i)}\) in the training set. The leave-one-out error \(E_{\mathrm{loo}}\), a.k.a predicted residual error sum of squares, is estimated by an empirical mean square of the predicted residual,
In the general case, CV can be an expensive technique due to the construction of the N models \(\tilde{u}_{[i]}\). However, a fast computation of \(E_{\mathrm{loo}}\) is possible in linear regression models (Seber and Lee 2003) by using the relation
where \(\tilde{u}(\varvec{\xi }^{(i)})\) is the single model built with all the training points, and \(h_i\) is the i-th diagonal term of the hat matrix \(H=P(P^{\top }P)^{-1}P^{\top }\) with P the design matrix of the linear regression. In practice, the vector \(\varvec{e}_{[]}\) of the predicted residual can be directly computed from the model outputs as follows
where \(I=[\delta _{ij}]\in \mathbb {R}^{N,N}\) is the identity matrix, \(\oslash \) denotes the component-wise division, and \(\varvec{1}=[1]^{\top }\in {\mathbb {R}^N}\) and \(\mathrm{diag}(H)\) represent the diagonal part of H.
Rights and permissions
About this article
Cite this article
Sochala, P., Iskandarani, M. On the Construction of Uncertain Time Series Surrogates Using Polynomial Chaos and Gaussian Processes. Math Geosci 52, 285–309 (2020). https://doi.org/10.1007/s11004-019-09806-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11004-019-09806-8