Abstract
Engineering computer codes are often computationally expensive. To lighten this load, we exploit new covariance kernels to replace computationally expensive codes with surrogate models. For input spaces with large dimensions, using the kriging model in the standard way is computationally expensive because a large covariance matrix must be inverted several times to estimate the parameters of the model. We address this issue herein by constructing a covariance kernel that depends on only a few parameters. The new kernel is constructed based on information obtained from the Partial Least Squares method. Promising results are obtained for numerical examples with up to 100 dimensions, and significant computational gain is obtained while maintaining sufficient accuracy.
Similar content being viewed by others
Abbreviations
- Matrices and:
-
vectors are in bold type.
- Symbol:
-
Meaning
- det:
-
Determinant of a matrix
- |⋅|:
-
Absolute value
- ℝ :
-
Set of real numbers
- ℝ + :
-
Set of positive real numbers
- n :
-
Number of sampling points
- d :
-
Dimensions
- h :
-
Number of principal components retained
- x :
-
1×d vector
- x j :
-
j th element of a vector x
- X :
-
n × d matrix containing sampling points
- y :
-
n × 1 vector containing simulation of X
- x (i) :
-
i th training point for i = 1,…,n (a 1 × d vector)
- w (l) :
-
d × 1 vector containing X weights given by the l th PLS iteration for l = 1,…,h
- X (0) :
-
X
- X (l−1) :
-
Matrix containing residual of inner regression of (l − 1)st PLS iteration for l = 1,…,h
- k(⋅, ⋅):
-
Covariance function
- (0, k(⋅, ⋅)):
-
Distribution of a Gaussian process with mean function 0 and covariance function k(⋅, ⋅)
- x t :
-
Superscript t denotes the transpose operation of the vector x
References
Alberto P, González F (2012) Partial Least Squares regression on symmetric positive-definite matrices. Rev Col Estad 36(1):177–192
Bachoc F (2013) Cross Validation and Maximum Likelihood estimation of hyper-parameters of Gaussian processes with model misspecification. Comput Stat Data Anal 66:55–69
Bishop CM (2007) Pattern recognition and machine learning (information science and statistics). Springer
Braham H, Ben Jemaa S, Sayrac B, Fort G, Moulines E (2014) Low complexity spatial interpolation for cellular coverage analysis. In: 2014 12th international symposium on modeling and optimization in mobile, ad hoc, and wireless networks (WiOpt). IEEE, pp 188–195
Buhmann MD (2003) Radial basis functions: theory and implementations, vol 12. Cambridge University Press, Cambridge
Cressie N (1988) Spatial prediction and ordinary kriging. Math Geol 20(4):405–421
Damianou A, Lawrence ND (2013) Deep gaussian processes. In: Proceedings of the sixteenth international conference on artificial intelligence and statistics, AISTATS 2013, Scottsdale, pp 207–215
Durrande N (2011) Covariance kernels for simplified and interpretable modeling. A functional and probabilistic approach. Theses, Ecole Nationale Supérieure des Mines de saint-Etienne
Durrande N, Ginsbourger D, Roustant O (2012) Additive covariance kernels for high-dimensional gaussian process modeling. Ann Fac Sci Toulouse Math 21(3):481–499
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley, New York
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148
Goovaerts P (1997) Geostatistics for natural resources evaluation (applied geostatistics). Oxford University Press, New York
Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River
Helland I (1988) On structure of partial least squares regression. Commun Stat - Simul Comput 17:581–607
Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. In: Proceedings of the twenty-ninth conference on uncertainty in artificial intelligence, Bellevue, p 2013
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492
Lanczos C (1950) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J Res Natl Bur Stand 45(4):255–282
Liem RP, Martins JRRA (2014) Surrogate models and mixtures of experts in aerodynamic performance prediction for mission analysis. In: 15th AIAA/ISSMO multidisciplinary analysis and optimization conference, Atlanta, GA, AIAA-2014-2301
Manne R (1987) Analysis of two Partial-Least-Squares algorithms for multivariate calibration. Chemom Intell Lab Syst 2(1–3):187–197
Mera NS (2007) Efficient optimization processes using kriging approximation models in electrical impedance tomography. Int J Numer Methods Eng 69(1):202–220
Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4: 1–32
Noesis Solutions (2009) OPTIMUS. http://www.noesissolutions.com/Noesis/optimus-details/optimus-design-optimization
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Picheny V, Ginsbourger D, Roustant O, Haftka RT, Kim NH (2010) Adaptive designs of experiments for accurate approximation of a target region. J Mech Des 132(7):071008
Powell MJ (1994) A direct search optimization method that models the objective and constraint functions by linear interpolation. In: Advances in optimization and numerical analysis. Springer, pp 51–67
Rasmussen C, Williams C (2006) Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, Cambridge
Regis R, Shoemaker C (2013) Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Eng Optim 45(5):529–555
Roustant O, Ginsbourger D, Deville Y (2012) DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization. J Stat Softw 51(1):1–55
Sakata S, Ashida F, Zako M (2004) An efficient algorithm for Kriging approximation and optimization with large-scale sampling data. Comput Methods Appl Mech Eng 193(3):385–404
Sasena M (2002) Flexibility and efficiency enhancements for constrained global design optimization with Kriging approximations. PhD thesis, University of Michigan
Schonlau M (1998) Computer experiments and global optimization. PhD thesis, University of Waterloo
Wahba G (1990) Spline models for observational data, CBMS-NSF regional conference series in applied mathematics, vol 59. Society for Industrial and Applied Mathematics (SIAM), Philadelphia
Wahba G, Craven P (1978) Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–404
Zimmerman DL, Homer KE (1991) A network design criterion for estimating selected attributes of the semivariogram. Environmetrics 2(4):425–441
Acknowledgments
The authors thank the anonymous reviewers for their insightful and constructive comments. We also extend our grateful thanks to A. Chiplunkar from ISAE SUPAERO, Toulouse and R. G. Regis from Saint Joseph’s University, Philadelphia for their careful correction of the manuscript and to SNECMA for providing the tables of experiment results. Finally, B. Kraabel is gratefully acknowledged for carefully reviewing the paper prior to publication.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Equations for ordinary kriging model
The expression (3) for the ordinary kriging model is transformed into (see Forrester and Sobester 2008)
where 1 denotes an n-vector of ones and
In addition, (6) is written as
Appendix B: Examples of kernels
Table 12 presents the most popular examples of stationary kernels. Table 13 presents the new KPLS kernels based on the examples given in Table 12.
Appendix C: Proof of equivalence kernel
For l = 1,…,h, k l are separable kernels (or a d-dimensional tensor product) of the same type, so ∃ ϕ l1,…,ϕ l d such that
where F l (x) i is the ith coordinate of F l (x). If we insert (20) in (14) we get
with
corresponding to an one-dimensional kernel. Hence, k k p l s1:h is a separable kernel. In particular, if we consider a generalized exponential kernel with p 1 = ⋯ = p h = p ∈ [0, 2], we obtain
with
We thus obtain
Appendix D: Results of Griewank function in 20D and 60D over interval [−5, 5]
In Tables 14 and 15, the mean and standard deviation (std) of the numerical experiments with the Griewank function are given for 20 and 60 dimensions, respectively. To better visualize the results, boxplots are used in Figs. 7–11.
Rights and permissions
About this article
Cite this article
Bouhlel, M.A., Bartoli, N., Otsmane, A. et al. Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction. Struct Multidisc Optim 53, 935–952 (2016). https://doi.org/10.1007/s00158-015-1395-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00158-015-1395-9