Abstract
In this paper, we investigate the capability of the universal Kriging (UK) model for single-objective global optimization applied within an efficient global optimization (EGO) framework. We implemented this combined UK-EGO framework and studied four variants of the UK methods, that is, a UK with a first-order polynomial, a UK with a second-order polynomial, a blind Kriging (BK) implementation from the ooDACE toolbox, and a polynomial-chaos Kriging (PCK) implementation. The UK-EGO framework with automatic trend function selection derived from the BK and PCK models works by building a UK surrogate model and then performing optimizations via expected improvement criteria on the Kriging model with the lowest leave-one-out cross-validation error. Next, we studied and compared the UK-EGO variants and standard EGO using five synthetic test functions and one aerodynamic problem. Our results show that the proper choice for the trend function through automatic feature selection can improve the optimization performance of UK-EGO relative to EGO. From our results, we found that PCK-EGO was the best variant, as it had more robust performance as compared to the rest of the UK-EGO schemes; however, total-order expansion should be used to generate the candidate trend function set for high-dimensional problems. Note that, for some test functions, the UK with predetermined polynomial trend functions performed better than that of BK and PCK, indicating that the use of automatic trend function selection does not always lead to the best quality solutions. We also found that although some variants of UK are not as globally accurate as the ordinary Kriging (OK), they can still identify better-optimized solutions due to the addition of the trend function, which helps the optimizer locate the global optimum.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00158-017-1867-1/MediaObjects/158_2017_1867_Fig1_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00158-017-1867-1/MediaObjects/158_2017_1867_Fig2_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00158-017-1867-1/MediaObjects/158_2017_1867_Fig3_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00158-017-1867-1/MediaObjects/158_2017_1867_Fig4_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00158-017-1867-1/MediaObjects/158_2017_1867_Fig5_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00158-017-1867-1/MediaObjects/158_2017_1867_Fig6_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00158-017-1867-1/MediaObjects/158_2017_1867_Fig7_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00158-017-1867-1/MediaObjects/158_2017_1867_Fig8_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00158-017-1867-1/MediaObjects/158_2017_1867_Fig9_HTML.gif)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00158-017-1867-1/MediaObjects/158_2017_1867_Fig10_HTML.gif)
Similar content being viewed by others
References
Bellary SAI, Samad A, Couckuyt I, Dhaene T (2016) A comparative study of Kriging variants for the optimization of a turbomachinery system. Eng Comput 32(1):49–59
Benassi R, Bect J, Vazquez E (2011) Robust Gaussian process-based global optimization using a fully Bayesian expected improvement criterion. In: International conference on learning and intelligent optimization. Springer, pp 176–190
Blatman G, Sudret B (2011) Adaptive sparse polynomial chaos expansion based on least angle regression. J Comput Phys 230(6):2345–2367
Couckuyt I, Forrester A, Gorissen D, De Turck F, Dhaene T (2012) Blind Kriging implementation and performance analysis. Adv Eng Softw 49:1–13
Couckuyt I, Dhaene T, Demeester P (2014) ooDACE toolbox: a flexible object-oriented Kriging implementation. J Mach Learn Res 15(1):3183–3186
Dubrule O (1983) Cross validation of Kriging in a unique neighborhood. J Int Assoc Math Geol 15(6):687–699
Dwight RP, Han Z-H (2009) Efficient uncertainty quantification using gradient-enhanced Kriging. AIAA paper 2276:2009
Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley, New York
Forsberg J, Nilsson L (2005) On polynomial response surfaces and Kriging for use in structural optimization of crashworthiness. Struct Multidiscip Optim 29(3):232–243
Jeong S, Obayashi S (2005) Efficient global optimization (EGO) for multi-objective problem and data mining. In: 2005 IEEE congress on evolutionary computation, vol 3, IEEE, pp 2138–2145
Jeong S, Murayama M, Yamamoto K (2005) Efficient optimization design method using Kriging model. J Aircr 42(2):413–420
Jones DR (2001) A taxonomy of global optimization methods based on response surfaces. J Glob Optim 21 (4):345–383
Jones DR, Schonlau M, Welch WJ (1998) Efficient global optimization of expensive black-box functions. J Glob Optim 13(4):455– 492
Joseph VR, Hung Y, Sudjianto A (2008) Blind Kriging: a new method for developing metamodels. J Mech Des 130(3):031102
Keane AJ (2006) Statistical improvement criteria for use in multiobjective design optimization. AIAA J 44 (4):879–891
Kersaudy P, Sudret B, Varsier N, Picon O, Wiart J (2015) A new surrogate modeling technique combining Kriging and polynomial chaos expansions-application to uncertainty analysis in computational dosimetry. J Comput Phys 286:103– 117
Kleijnen JP, van Beers W, Van Nieuwenhuyse I (2012) Expected improvement in efficient global optimization through bootstrapped Kriging. J Glob Optim 54(1):59–73
Knowles J (2006) ParEGO: a hybrid algorithm with on-line landscape approximation for expensive multiobjective optimization problems. IEEE Trans Evol Comput 10(1):50–66
Krige D (1951) A statistical approach to some mine valuation and allied problems on the Witwatersrand
Kulfan BM (2008) Universal parametric geometry representation method. J Aircr 45(1):142–158
Liang H, Zhu M (2013) Comment on Metamodeling method using dynamic Kriging for design optimization. AIAA J 51(12):2988–2989
Liang H, Zhu M, Wu Z (2014) Using cross-validation to design trend function in Kriging surrogate modeling. AIAA J 52(10):2313–2327
Matheron G (1969) Les cahiers du centre de morphologie mathématique de fontainebleau fascicule 1. Le krigeage universel. Ecole de Mines de Paris, Fontainebleau
Palar PS, Tsuchiya T, Parks GT (2016) A comparative study of local search within a surrogate-assisted multi-objective memetic algorithm framework for expensive problems. Appl Soft Comput 43:1–19
Parr J, Keane A, Forrester AI, Holden C (2012) Infill sampling criteria for surrogate-based optimization with constraint handling. Eng Optim 44(10):1147–1166
Ray T, Tsai H (2004) Swarm algorithm for single-and multiobjective airfoil design optimization. AIAA J 42(2):366–373
Sacks J, Welch WJ, Mitchell TJ, Wynn HP (1989) Design and analysis of computer experiments. Stat Sci 4(4):409–423
Sakata S, Ashida F, Zako M (2003) Structural optimization using Kriging approximation. Comput Methods Appl Mech Eng 192(7):923–939
Schöbi R, Sudret B (2014) Combining polynomial chaos expansions and Kriging for solving structural reliability problems. In: Proc. 7th int. Conf. on comp. Stoch. Mech (CSM7), Santorini, Greece
Schöbi R, Sudret B, Marelli S (2016) Rare event estimation using polynomial-chaos Kriging. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering page D4016002
Schobi R, Sudret B, Wiart J (2015) Polynomial-chaos-based Kriging. Int J Uncertain Quantif 5(2):171–193
Shimoyama K, Kawai S, Alonso JJ (2013) Dynamic adaptive sampling based on Kriging surrogate models for efficient uncertainty quantification. In: 15th AIAA non-deterministic approaches conference, pp 2013–1470
Simpson TW, Mauery TM, Korte JJ, Mistree F (2001) Kriging models for global approximation in simulation-based multidisciplinary design optimization. AIAA J 39(12):2233–2241
Sobol IM (1993) Sensitivity estimates for nonlinear mathematical models. Math Model Comput Exp 1 (4):407–414
Stein ML (2012) Interpolation of spatial data: some theory for Kriging. Springer, Berlin
Swiler L, Paez T, Mayes R, Eldred M (2009) Epistemic uncertainty in the calculation of margins. In: AIAA structures, structural dynamics, and materials conference. Palm Springs CA
Ur Rehman S, Langelaar M (2015) Efficient global robust optimization of unconstrained problems affected by parametric uncertainties. Struct Multidiscip Optim 52(2):319–336
Ur Rehman S, Langelaar M, van Keulen F (2014) Efficient Kriging-based robust optimization of unconstrained problems. J Comput Sci 5(6):872–881
Viana FA, Haftka RT, Watson LT (2013) Efficient global optimization algorithm assisted by multiple surrogate techniques. J Glob Optim 56(2):669–689
Wiener N (1938) The homogeneous chaos. Am J Math 60(4):897–936
Xiu D, Karniadakis GE (2002) The Wiener-Askey polynomial chaos for stochastic differential equations. SIAM J Sci Comput 24(2):619–644
Zhao L, Choi K, Lee I (2011) Metamodeling method using dynamic Kriging for design optimization. AIAA J 49(9):2034– 2046
Acknowledgements
Koji Shimoyama was supported in part by the Grant-in-Aid for Scientific Research (B) No. H1503600 administered by the Japan Society for the Promotion of Science (JSPS).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Trend function selection and hyperparameter optimization strategy for PCK
We propose a sequential hyperparameter optimization strategy based on BFGS at each iteration of the UK construction process that utilizes the optimum solution from the previous iteration. This strategy can be applied to both BK and PCK since both methods work by scanning from the provided polynomial set. Regardless, our strategy uses the optimum solution obtained in the previous iteration as the initial solution for the BFGS search in the current iteration. Our primary motivation for applying this strategy is that the likelihood function for one iteration might change only slightly relative to the previous iteration, indicating the proximity of the global optimum hyperparameter locations. Further, in our strategy, a GA is only used in the first iteration to find the optimum hyperparameters of the OK before adding more trend functions. Here, we utilize a GA with a population size of 100 and a maximum of 200 generations followed by BFGS search. This exhaustive search is used only in the first iteration since the accuracy of the hyperparameters’ optimization procedures that follow relies on the accuracy of the OK hyperparameters. After the final trend function is identified, our GA+BFGS approach is then applied again with this final trend function to search for possible higher values of the likelihood. We call our strategy here the simplified GA+BFGS strategy as opposed to the exhaustive GA+BFGS strategy.
To verify the performance of our simplified GA+BFGS strategy, we compared its performance with the exhaustive GA+BFGS strategy using five test functions mentioned in Appendix B. In this study, we set sample size to 20, 60, and 40 for the two-dimensional problems, Hartman-6, and borehole problem, respectively. We generally used N s = 10 × m, where N s is the sample size, to generate the sample set, with the only exception being the borehole problem in which we set the sample size to 40 to make the optimization problem more difficult. We also compared our simplified GA+BFGS strategy with the simple BFGS strategy that employs a one-shot strategy with a random initial solution at each UK iteration. More specifically here, we compared the lowest LOOCV errors resulting from these three strategies.
For all five test functions, we observe from Fig. 11 that the error performances of the UK for the exhaustive GA+BFGS and simplified GA+BFGS strategies are similar to one another. We observe that the performance of the simple BFGS strategy was not as good as that of the other two strategies. All strategies performed approximately the same for the Hartman-6 problem; this problem is a highly nonlinear and difficult problem in which the UK did not perform better than the standard OK in terms of approximation quality, thus explaining why UK hyperparameter tuning minimally affects LOOCV error. The lower performance of the simple BFGS strategy here signifies that the discovery of the optimum of a likelihood function for the UK is sensitive to the choice of the initial point.
The time required to train the hyperparameters using these simplified and exhaustive strategies on a two-dimensional function with p = 4 was approximately 3 and 40 seconds, respectively, on a computer with Intel®; Xeon(R) E5-1630 v4 8 core CPU @ 3.70GHz equipped with MATLAB. This indicates that our simplified strategy can perform similarly to the exhaustive strategy in only 7.5% of the time required by the exhaustive approach.
Appendix B: Test functions
-
1.
Branin function (two variables).
$$\begin{array}{@{}rcl@{}} f_{1}(\boldsymbol{x}) &=& \left( b_{2}-\frac{5.1}{4\pi^{2}}{b_{1}^{2}}+\frac{5}{\pi}b_{1}-6 \right)^{2} \\ &&+ 10 \left[\left( 1-\frac{1}{8\pi} \right) \text{cos }(b_{1})+ 1\right], \end{array} $$(29)where b 1 = 15x 1 − 5,b 2 = 15x 2, and x 1,x 2 ∈ [0, 1]2.
-
2.
Sasena function (two variables).
$$\begin{array}{@{}rcl@{}} f(\boldsymbol{x}) &=& 2 + 0.01(x_{2}-{x_{1}^{2}})^{2}+(1-x_{1})^{2} + 2(2-x_{2})^{2} \\ &&+ 7\text{sin }(0.5x_{1}) \sin~(0.7x_{1}x_{2}). \\ &&x_{1}\in[0,5], x_{2}\in[0,5]. \end{array} $$(30) -
3.
Hosaki function (two variables)
$$\begin{array}{@{}rcl@{}} f(\boldsymbol{x}) &=& \left( 1-8x_{1}+ 7{x_{1}^{2}} \,-\, (7/3){x_{1}^{3}} +(1/4){x_{1}^{4}} \right) {x_{2}^{2}}e^{-x_{1}}. \\ &&\quad\quad\quad\quad\quad\quad\quad x_{1}\in[0,5], x_{2}\in[0,5]. \end{array} $$(31) -
4.
Hartman-6 function (six variables)
$$\begin{array}{@{}rcl@{}} f(\boldsymbol{x}) &=& -{\sum}_{i = 1}^{4}c_{i}\text{exp } \left\{-{\sum}_{j = 1}^{n}\mathrm{A}_{ij}(x_{j}-\mathrm{P}_{ij})^{2}\right\} , \\ \boldsymbol{x}&=&(x_{1},x_{2},\ldots,x_{n})^{T}, x_{i}\in[0,1] \end{array} $$(32)where
$$ \boldsymbol{c} = [1.0, 1.2, 3, 3.2]^{T}. $$(33)$$ \textrm{\textbf{A}} = \left[\begin{array}{cccccc} 10& 3& 17& 3.5& 1.7& 8\\ 0.05& 10& 17& 0.1& 8& 14\\ 3& 3.5& 1.7& 10& 17& 8\\ 17& 8& 0.05& 10& 0.1& 14 \end{array}\right] $$(34)$$ \textrm{\textbf{P}} = 10^{-4} \left[\begin{array}{cccccc} 1312& 1696& 5569& 124& 8283& 5886\\ 2329& 4135& 8307& 3736& 1004& 9991\\ 2348& 1451& 3522& 2883& 3047& 6650\\ 4047& 8828& 8732& 5743& 1091& 381 \end{array}\right] $$(35) -
5.
Borehole function (eight variables)
$$ f(\boldsymbol{x}) = \frac{2\pi T_{u}(H_{u}-H_{l})}{\ln(r/r_{w})\left( 1+\frac{2LT_{u}}{\ln(r/r_{w}){r_{w}^{2}}K_{w}}+\frac{T_{u}}{T_{l}}\right)} $$(36)where the input variables are defined as shown in Table 3.
Table 3 The input variables and their input ranges for the borehole test function
Appendix C: Boxplot
For the boxplots, the bottom and top of each box represent the lower quartile Q1 (i.e., 25%) and upper quartile Q3 (i.e., 75%), respectively. The line between the top and bottom of the box represents the median (i.e., 50%). Further, the whiskers below and above the box are drawn from Q1 − 1.5 IQR and Q3 + 1.5 IQR, where IQR represents the interquartile range (i.e., Q3-Q1). Observations that lie beyond the whisker length are identified as outliers. Finally, the circle denotes the mean of the observations.
Rights and permissions
About this article
Cite this article
Palar, P.S., Shimoyama, K. On efficient global optimization via universal Kriging surrogate models. Struct Multidisc Optim 57, 2377–2397 (2018). https://doi.org/10.1007/s00158-017-1867-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00158-017-1867-1