Skip to main content
Log in

Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction

  • REVIEW ARTICLE
  • Published:
Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Abstract

Engineering computer codes are often computationally expensive. To lighten this load, we exploit new covariance kernels to replace computationally expensive codes with surrogate models. For input spaces with large dimensions, using the kriging model in the standard way is computationally expensive because a large covariance matrix must be inverted several times to estimate the parameters of the model. We address this issue herein by constructing a covariance kernel that depends on only a few parameters. The new kernel is constructed based on information obtained from the Partial Least Squares method. Promising results are obtained for numerical examples with up to 100 dimensions, and significant computational gain is obtained while maintaining sufficient accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Abbreviations

Matrices and:

vectors are in bold type.

Symbol:

Meaning

det:

Determinant of a matrix

|⋅|:

Absolute value

:

Set of real numbers

+ :

Set of positive real numbers

n :

Number of sampling points

d :

Dimensions

h :

Number of principal components retained

x :

d vector

x j :

j th element of a vector x

X :

n × d matrix containing sampling points

y :

n × 1 vector containing simulation of X

x (i) :

i th training point for i = 1,…,n (a 1 × d vector)

w (l) :

d × 1 vector containing X weights given by the l th PLS iteration for l = 1,…,h

X (0) :

X

X (l−1) :

Matrix containing residual of inner regression of (l − 1)st PLS iteration for l = 1,…,h

k(⋅, ⋅):

Covariance function

(0, k(⋅, ⋅)):

Distribution of a Gaussian process with mean function 0 and covariance function k(⋅, ⋅)

x t :

Superscript t denotes the transpose operation of the vector x

References

  • Alberto P, González F (2012) Partial Least Squares regression on symmetric positive-definite matrices. Rev Col Estad 36(1):177–192

    MathSciNet  MATH  Google Scholar 

  • Bachoc F (2013) Cross Validation and Maximum Likelihood estimation of hyper-parameters of Gaussian processes with model misspecification. Comput Stat Data Anal 66:55–69

    Article  MathSciNet  Google Scholar 

  • Bishop CM (2007) Pattern recognition and machine learning (information science and statistics). Springer

  • Braham H, Ben Jemaa S, Sayrac B, Fort G, Moulines E (2014) Low complexity spatial interpolation for cellular coverage analysis. In: 2014 12th international symposium on modeling and optimization in mobile, ad hoc, and wireless networks (WiOpt). IEEE, pp 188–195

  • Buhmann MD (2003) Radial basis functions: theory and implementations, vol 12. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Cressie N (1988) Spatial prediction and ordinary kriging. Math Geol 20(4):405–421

    Article  MathSciNet  MATH  Google Scholar 

  • Damianou A, Lawrence ND (2013) Deep gaussian processes. In: Proceedings of the sixteenth international conference on artificial intelligence and statistics, AISTATS 2013, Scottsdale, pp 207–215

  • Durrande N (2011) Covariance kernels for simplified and interpretable modeling. A functional and probabilistic approach. Theses, Ecole Nationale Supérieure des Mines de saint-Etienne

  • Durrande N, Ginsbourger D, Roustant O (2012) Additive covariance kernels for high-dimensional gaussian process modeling. Ann Fac Sci Toulouse Math 21(3):481–499

    Article  MathSciNet  MATH  Google Scholar 

  • Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley, New York

    Book  Google Scholar 

  • Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148

    Article  MATH  Google Scholar 

  • Goovaerts P (1997) Geostatistics for natural resources evaluation (applied geostatistics). Oxford University Press, New York

    Google Scholar 

  • Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall PTR, Upper Saddle River

    Google Scholar 

  • Helland I (1988) On structure of partial least squares regression. Commun Stat - Simul Comput 17:581–607

    Article  MathSciNet  MATH  Google Scholar 

  • Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. In: Proceedings of the twenty-ninth conference on uncertainty in artificial intelligence, Bellevue, p 2013

  • Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455–492

    Article  MathSciNet  MATH  Google Scholar 

  • Lanczos C (1950) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J Res Natl Bur Stand 45(4):255–282

    Article  MathSciNet  Google Scholar 

  • Liem RP, Martins JRRA (2014) Surrogate models and mixtures of experts in aerodynamic performance prediction for mission analysis. In: 15th AIAA/ISSMO multidisciplinary analysis and optimization conference, Atlanta, GA, AIAA-2014-2301

  • Manne R (1987) Analysis of two Partial-Least-Squares algorithms for multivariate calibration. Chemom Intell Lab Syst 2(1–3):187–197

    Article  Google Scholar 

  • Mera NS (2007) Efficient optimization processes using kriging approximation models in electrical impedance tomography. Int J Numer Methods Eng 69(1):202–220

    Article  MathSciNet  MATH  Google Scholar 

  • Michalewicz Z, Schoenauer M (1996) Evolutionary algorithms for constrained parameter optimization problems. Evol Comput 4: 1–32

    Article  Google Scholar 

  • Noesis Solutions (2009) OPTIMUS. http://www.noesissolutions.com/Noesis/optimus-details/optimus-design-optimization

  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Picheny V, Ginsbourger D, Roustant O, Haftka RT, Kim NH (2010) Adaptive designs of experiments for accurate approximation of a target region. J Mech Des 132(7):071008

    Article  Google Scholar 

  • Powell MJ (1994) A direct search optimization method that models the objective and constraint functions by linear interpolation. In: Advances in optimization and numerical analysis. Springer, pp 51–67

  • Rasmussen C, Williams C (2006) Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, Cambridge

    MATH  Google Scholar 

  • Regis R, Shoemaker C (2013) Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive black-box optimization. Eng Optim 45(5):529–555

    Article  MathSciNet  Google Scholar 

  • Roustant O, Ginsbourger D, Deville Y (2012) DiceKriging, DiceOptim: two R packages for the analysis of computer experiments by kriging-based metamodeling and optimization. J Stat Softw 51(1):1–55

    Article  Google Scholar 

  • Sakata S, Ashida F, Zako M (2004) An efficient algorithm for Kriging approximation and optimization with large-scale sampling data. Comput Methods Appl Mech Eng 193(3):385–404

    Article  MATH  Google Scholar 

  • Sasena M (2002) Flexibility and efficiency enhancements for constrained global design optimization with Kriging approximations. PhD thesis, University of Michigan

  • Schonlau M (1998) Computer experiments and global optimization. PhD thesis, University of Waterloo

  • Wahba G (1990) Spline models for observational data, CBMS-NSF regional conference series in applied mathematics, vol 59. Society for Industrial and Applied Mathematics (SIAM), Philadelphia

    Google Scholar 

  • Wahba G, Craven P (1978) Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–404

    Article  MathSciNet  MATH  Google Scholar 

  • Zimmerman DL, Homer KE (1991) A network design criterion for estimating selected attributes of the semivariogram. Environmetrics 2(4):425–441

    Article  Google Scholar 

Download references

Acknowledgments

The authors thank the anonymous reviewers for their insightful and constructive comments. We also extend our grateful thanks to A. Chiplunkar from ISAE SUPAERO, Toulouse and R. G. Regis from Saint Joseph’s University, Philadelphia for their careful correction of the manuscript and to SNECMA for providing the tables of experiment results. Finally, B. Kraabel is gratefully acknowledged for carefully reviewing the paper prior to publication.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Amine Bouhlel.

Appendices

Appendix A: Equations for ordinary kriging model

The expression (3) for the ordinary kriging model is transformed into (see Forrester and Sobester 2008)

$$ y(\textbf{x}) = \hat{\beta} + \textbf{r}_{\textbf{x}\textbf{X}}^{t}\textbf{R}^{-1}\left( \textbf{y}-\textbf{1}\hat{\beta}\right), $$
(17)

where 1 denotes an n-vector of ones and

$$ \hat{\beta}= (\textbf{1}^{t}\textbf{R}^{-1}\textbf{1})^{-1}\textbf{1}^{t}\textbf{R}^{-1}\textbf{y}. $$
(18)

In addition, (6) is written as

$$ \hat{\sigma}^{2}=\frac{1}{n}\left( \textbf{y}-\textbf{1}\hat{\beta}\right)^{t}\textbf{R}^{-1}\left( \textbf{y}-\textbf{1}\hat{\beta}\right). $$
(19)

Appendix B: Examples of kernels

Table 12 presents the most popular examples of stationary kernels. Table 13 presents the new KPLS kernels based on the examples given in Table 12.

Table 12 Examples of commonly used stationary covariance functions
Table 13 Examples of KPLS covariance functions

Appendix C: Proof of equivalence kernel

For l = 1,…,h, k l are separable kernels (or a d-dimensional tensor product) of the same type, so ∃ ϕ l1,…,ϕ l d such that

$$ k_{l}\left( \textbf{x},{\textbf{x}^{\prime}}\right) = \prod\limits_{i=1}^{d} \phi_{li}(F_{l}\left( \textbf{x}\right)_{i},F_{l}\left( {\textbf{x}^{\prime}}\right)_{i}), $$
(20)

where F l (x) i is the ith coordinate of F l (x). If we insert (20) in (14) we get

$$\begin{array}{@{}rcl@{}} k_{kpls1:h}(\textbf{x},\textbf{x}^{\prime}) &=& \prod\limits_{l=1}^{h} k_{l}(F_{l}\left( \textbf{x}\right),F_{l}\left( {\textbf{x}^{\prime}}\right))\\ &=&\prod\limits_{l=1}^{h} \prod\limits_{i=1}^{d} \phi_{li}(F_{l}\left( \textbf{x}\right)_{i},F_{l}\left( {\textbf{x}^{\prime}}\right)_{i})\\ &=&\prod\limits_{i=1}^{d}\prod\limits_{l=1}^{h} \phi_{li}(F_{l}\left( \textbf{x}\right)_{i},F_{l}\left( {\textbf{x}^{\prime}}\right)_{i})\\ &=&\prod\limits_{i=1}^{d}\psi_{i}\left( \textbf{x}_{i},{\textbf{x}^{\prime}}_{i}\right), \end{array} $$
(21)

with

$$\psi_{i}\left( \textbf{x}_{i},{\textbf{x}^{\prime}}_{i}\right)=\prod\limits_{l=1}^{h} \phi_{li}(F_{l}\left( \textbf{x}\right)_{i},F_{l}\left( {\textbf{x}^{\prime}}\right)_{i}), $$

corresponding to an one-dimensional kernel. Hence, k k p l s1:h is a separable kernel. In particular, if we consider a generalized exponential kernel with p 1 = ⋯ = p h = p ∈ [0, 2], we obtain

$$\begin{array}{@{}rcl@{}} \psi_{i}\left( \textbf{x}_{i},{\textbf{x}^{\prime}}_{i}\right)&=&\sigma^{\frac{2}{d}}\exp\left( -\sum\limits_{l=1}^{h}\theta_{l}\left|\textbf{w}^{(l)}_{*i}\right|^{p} \left|\textbf{x}_{i}-{\textbf{x}^{\prime}}_{i}\right|^{p}\right)\\ &=&\sigma^{\frac{2}{d}}\exp\left( -\eta_{i}\left|\textbf{x}_{i}-{\textbf{x}^{\prime}}_{i}\right|^{p}\right), \end{array} $$
(22)

with

$$\eta_{i}=\sum\limits_{l=1}^{h}\theta_{l}\left|\textbf{w}^{(l)}_{*i}\right|^{p}. $$

We thus obtain

$$ k_{l}\left( \textbf{x},{\textbf{x}^{\prime}}\right) = \sigma^{2}\prod\limits_{i=1}^{d} \exp\left( -\eta_{i}\left|\textbf{x}_{i}-{\textbf{x}^{\prime}}_{i}\right|^{p}\right). $$
(23)

Appendix D: Results of Griewank function in 20D and 60D over interval [−5, 5]

In Tables 14 and 15, the mean and standard deviation (std) of the numerical experiments with the Griewank function are given for 20 and 60 dimensions, respectively. To better visualize the results, boxplots are used in Figs. 711.

Table 14 Results for Griewank function in 20D over interval [−5, 5]
Table 15 Results for Griewank function in 60D over interval [−5, 5]
Fig. 7
figure 7

R E for Griewank function in 20D over interval [−5, 5]. Experiments are based on the 10 latin hypercube design

Fig. 8
figure 8

CPU time for Griewank function in 20D over interval [−5, 5]. Experiments are based on the 10 latin hypercube design

Fig. 9
figure 9

RE for Griewank function in 60D over interval [−5, 5]. Experiments are based on the 10 latin hypercube design

Fig. 10
figure 10

CPU time for Griewank function in 60D over interval [−5, 5]. Experiments are based on the 10 latin hypercube design

Fig. 11
figure 11

CPU time for Griewank function in 60D for only KPLS models over interval [−5, 5]. Experiments are based on the 10 latin hypercube design

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bouhlel, M.A., Bartoli, N., Otsmane, A. et al. Improving kriging surrogates of high-dimensional design models by Partial Least Squares dimension reduction. Struct Multidisc Optim 53, 935–952 (2016). https://doi.org/10.1007/s00158-015-1395-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00158-015-1395-9

Keywords

Navigation