## Abstract

This paper presents an automated surrogate model selection framework called the Concurrent Surrogate Model Selection or COSMOS. Unlike most existing techniques, COSMOS coherently operates at three levels, namely: 1) selecting the model type (e.g., RBF or Kriging), 2) selecting the kernel function type (e.g., cubic or multiquadric kernel in RBF), and 3) determining the optimal values of the typically user-prescribed hyper-parameters (e.g., shape parameter in RBF). The quality of the models is determined and compared using measures of median and maximum error, given by the Predictive Estimation of Model Fidelity (PEMF) method. PEMF is a robust implementation of sequential *k*-fold cross-validation. The selection process undertakes either a cascaded approach over the three levels or a more computationally-efficient one-step approach that solves a mixed-integer nonlinear programming problem. Genetic algorithms are used to perform the optimal selection. Application of COSMOS to benchmark test functions resulted in optimal model choices that agree well with those given by analyzing the model errors on a large set of additional test points. For the four analytical benchmark problems and three practical engineering applications – airfoil design, window heat transfer modeling, and building energy modeling – diverse forms of models/kernels are observed to be selected as optimal choices. These observations further establish the need for automated multi-level model selection that is also guided by dependable measures of model fidelity.

### Similar content being viewed by others

## References

Acar E (2010) Optimizing the shape parameters of radial basis functions: An application to automobile crashworthiness. Proceedings of the Institution of Mechanical Engineers, Part D: Journal of Automobile Engineering 224(12):1541–1553

Acar E, Rais-Rohani M (2009) Ensemble of metamodels with optimized weight factors. Struct Multidiscip Optim 37(3):279–294

Ali MM, Khompatraporn C, Zabinsky ZB (2005) A numerical evaluation of several stochastic algorithms on selected continuous global optimization test problems. J Glob Optim 31(4):635–672

Ascione F, Bianco N, Stasio CD, Mauro GM, Vanoli GP (2017) Artificial neural networks to predict energy performance and retrofit scenarios for any member of a building category: a novel approach. Energy 26(118):999–1017

Basak D, Srimanta P, Patranabis DC (2007) Support vector regression. Neural Information Processing-Letters and Review 11(10):203–224

Ben-Hur A, Weston J (2010) A user’s guide to support vector machines. Data mining techniques for the life sciences, pp 223–239

Blatman G, Sudret B (2011) Adaptive sparse polynomial chaos expansion based on least angle regression. J Comput Phys 230(6):2345–2367

Bozdogan H (2000) Akaike’s information criterion and recent developments in information complexity. J Math Psychol 44:62–91

Chang C-C, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27

Chen PW, Wang JY, Lee HM (2004) Model selection of svms using ga approach. In: IEEE international joint conference on neural networks, 2004. Proceedings. 2004, IEEE, vol 3, pp 2035–2040

Chen X, Yang H, Sun K (2017) Developing a meta-model for sensitivity analyses and prediction of building performance for passively designed high-rise residential buildings. Appl Energy 194:422–439

Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge Books

Coelho F, Breitkopf P, Knopf-Lenoir C (2008) Model reduction for multidisciplinary optimization: application to a 2d wing. Struct Multidiscip Optim 37(1):29–48

Couckuyt I, Dhaene T, Demeester P (2014) Oodace toolbox: a flexible object-oriented Kriging implementation. J Mach Learn Res 15(1):3183–3186

Crawley DB, Pedersen CO, Lawrie LK, Winkelmann FC (2000) Energyplus: energy simulation program. ASHRAE J 49(4)

Cressie N (1993) Statistics for spatial data. Wiley, New York

Deb K (2002) A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6 (2):182–197

Deru M, Field K, Studer D, Benne K, Griffith B, Torcellini P, Liu B (2011) US department of energy commercial reference building models of the national building stock. Tech. rep., Department of Energy

Deschrijver D, Dhaene T (2005) An alternative approach to avoid overfitting for surrogate models. In: Signal propagation on interconnects, 2005. Proceedings. 9th IEEE workshop, pp 111– 114

DOE (2017) (Accessed on Jan 15, 2017) commercial prototype building models. http://www.energycodes.gov/development/commercial

Fang A, Rais-Rohani M, Liu Z, Horstemeyer MF (2005) A comparative study of metamodeling methods for multiobjective crashworthiness optimization. Comput Struct 83(25):2121–2136

Forrester A, Keane A (2009) Recent advances in surrogate-based optimization. Prog Aerosp Sci 45(1-3):50–79

Forrester A, Sobester A, Keane A (2008) Engineering design via surrogate modelling: a practical guide. Wiley

Giovanis DG, Papaioannou I, Straub D, Papadopoulos V (2017) Bayesian updating with subset simulation using artificial neural networks. Comput Methods Appl Mech Eng 319:124–145

Giunta AA, Watson L (1998) A comparison of approximation modeling techniques: polynomial versus interpolating models. AIAA Journal (AIAA-98-4758)

Goel T, Stander N (2009) Comparing three error criteria for selecting radial basis function network topology. Comput Methods Appl Mech Eng 198:2137–2150

Gorissen D, Dhaene T, Turck FD (2009) Evolutionary model type selection for global surrogate modeling. J Mach Learn Res 10:2039–2078

Gorissen D, Couckuyt I, Demeester P, Dhaene T, Crombecq K (2010) A surrogate modeling and adaptive sampling toolbox for computer based design. J Mach Learn Res 11:2051–2055

Haftka RT, Villanueva D, Chaudhuri A (2016) Parallel surrogate-assisted global optimization with expensive functions – a survey. Struct Multidiscip Optim 54(1):3–13

Hamza K, Saitou K (2012) A co-evolutionary approach for design optimization via ensembles of surrogates with application to vehicle crashworthiness. J Mech Des 134(1):011,001–10

Hardy RL (1971) Multiquadric equations of topography and other irregular surfaces. J Geophys Res 76:1905–1915

Holena M, Demut R (2011) Assessing the suitability of surrogate models in evolutionary optimization. In: Information technologies, pp 31–38

Jakeman JD, Narayan A, Zhou T (2017) A generalized sampling and preconditioning scheme for sparse approximation of polynomial chaos expansions. SIAM J Sci Comput 39(3):A1114–A1144

Jia G, Taflanidis AA (2013) Kriging metamodeling for approximation of high-dimensional wave and surge responses in real-time storm/hurricane risk assessment. Comput Methods Appl Mech Eng 261:24–38

Jin R, Chen W, Simpson TW (2001) Comparative studies of metamodelling techniques under multiple modelling criteria. Struct Multidiscip Optim 23(1):1–13

Lee H, Jo Y, Lee D, Choi S (2016) Surrogate model based design optimization of multiple wing sails considering flow interaction effect. Ocean Eng 121:422–436

Li YF, Ng SH, Xie M, Goh TN (2010) A systematic comparison of metamodeling techniques for simulation optimization in decision support systems. Appl Soft Comput 10(2):255–268

Lin S (2011) A nsga-ii program in matlab, version 1.4 ed

Lophaven SN, Nielsen HB, Sondergaard J (2002) Dace - a matlab kriging toolbox, version 2.0. Tech. Rep IMM-REP-2002-12. Informatics and Mathematical Modelling Report, Technical University of Denmark

Martin JD, Simpson TW (2005) Use of kriging models to approximate deterministic computer models. AIAA J 43(4):853–863

Mehmani A, Chowdhury S, Messac A (2015a) Predictive quantification of surrogate model fidelity based on modal variations with sample density. Struct Multidiscip Optim 52(2):353–373

Mehmani A, Chowdhury S, Tong W, Messac A (2015b) Adaptive switching of variable-fidelity models in population-based optimization. In: Engineering and applied sciences optimization, computational methods in applied sciences, vol 38. Springer International Publishing, pp 175–205

Molinaro AM, Simon R, Pfeiffer RM (2005) Rprediction error estimation: a comparison of resampling methods. Bioinformatics 21(15):3301–3307

Mongillo M (2011) Choosing basis functions and shape parameters for rad-ial basis function methods. In: SIAM undergraduate research online

Qudeiri JEA, Khadra FYA, Umer U, Hussein HMA (2015) Response surface metamodel to predict springback in sheet metal air bending process. International Journal of Materials, Mechanics and Manufacturing 3(4):203–224

Queipo N, Haftka R, Shyy W, Goel T, Vaidyanathan R, Tucker P (2005) Surrogate-based analysis and optimization. Prog Aerosp Sci 41(1):1–28

Reute IM, Mailach VR, Becker KH, Fischersworring-Bunk A, Schlums H, Ivankovic M (2017) Moving least squares metamodels-hyperparameter, variable reduction and model selection. In: 14th international probabilistic workshop. Springer International Publishing, pp 63–80

Rippa S (1999) An algorithm for selecting a good value for the parameter c in radial basis function interpolation. Adv Comput Math 11(2-3):193–210

Roustant O, Ginsbourger D, Deville Y (2012) Dicekriging, diceoptim: two r packages for the analysis of computer experiments by Kriging-based metamodeling and optimization. J Stat Softw 51(1):518–523

Soares C, Brazdil PB, Kuba P (2004) A meta-learning method to select the kernel width in support vector regression. Mach Learn 54(3):195–209

Solomatine D, Ostfeld A (2008) Data-driven modelling: some past experiences and new approaches. J Hyd’roinf 10(1):3–22

Takahashi R, Prasai D, Adams BL, Mattson CA (2012) Hybrid bishop-hill model for elastic-yield limited design with non-orthorhombic polycrystalline metals. J Eng Mater Technol 134(1):0110,031–12

Tian W (2013) A review of sensitivity analysis methods in building energy analysis. Renew Sust Energ Rev 20:411–419

Viana FAC, Haftka RT, Steffen V (2009) Multiple surrogates: how cross-validation errors can help us to obtain the best predictor. Struct Multidiscip Optim 39:439–457

Viana FAC, Venter G, Balabanov V (2010) An algorithm for fast optimal latin hypercube design of experiments. Int J Numer Methods Eng 82(2):135–156

Zhang J, Messac A, Zhang J, Chowdhury S (2014) Adaptive optimal design of active thermoelectric windows using surrogate modeling. Optim Eng 15(2):469–483

Zhang M, Gou W, Li L, Yang F, Yue Z (2016) Multidisciplinary design and multi-objective optimization on guide fins of twin-web disk using Kriging surrogate model. Struct Multidiscip Optim 55(1):361–373

Zhang Y, Park C, Kim NH, Haftka RT (2017) Function prediction at one inaccessible point using converging lines. J Mech Des 139(5):051,402

## Acknowledgements

Support from the National Science Foundation (NSF) Awards CMMI-1642340 and CNS-1524628 is gratefully acknowledged. Any opinions, findings, conclusions, or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the NSF.

## Author information

### Authors and Affiliations

### Contributions

The different core concepts underlying PEMF and COSMOS were conceived, implemented and tested (through MATLAB) by Ali Mehmani and Souma Chowdhury, with important conceptual contributions from Achille Messac with regards to the surrogate modeling paradigm. The airfoil design and building peak cooling model in this paper were developed and implemented by Ali Mehmani, with support from Christoph Meinrenken on the latter.

### Corresponding author

## Additional information

Parts of this manuscript have been presented at the ASME International Design Engineering Technical Conferences & Computers and Information in Engineering Conference, in 2014, at Buffalo, NY - Paper Number: DETC2014-35358

## Appendices

### Appendix A: Surrogate model candidates

### 1.1 Radial basis function (RBF)

The idea of using Radial Basis Functions (RBF) as approximation functions was conceived by (Hardy 1971). The RBF approximation is a linear combination of the basis functions (Ψ) computed with respect to each sample point, as given by

In (9), *n*
_{
p
} denotes the number of selected sample points; *w*
_{
i
}’s are the weights estimated using the pseudo inverse method, based on the training data; and *ψ* is the basis function that is expressed in terms of the Euclidean distance, *r* = ∥*x* − *x*
^{i}∥, of a point *x* from a given sample point, *x*
^{i}. The most effective forms of the basis functions are listed in Table 5 where *σ* represents the shape parameter of the basis function. The shape parameter in a basis function has a strong impact on the accuracy of the trained RBF. A smaller shape parameter often corresponds to a wider basis function, and the shape parameter, *σ* = 0, corresponds to a constant basis function (Mongillo 2011). The different RBFs considered in this paper are listed in Table 5.

### 1.2 Kriging

Kriging (Giunta and Watson 1998) is an approach to approximate irregular data. The kriging approximation function consists of two components: (*i*) a global trend function, and (*i*
*i*) a deviation function representing the departure from the trend function. The trend function is generally a polynomial (e.g., constant, linear, or quadratic). The general form of the kriging surrogate model is given by Cressie (1993):

where \(\bar {F}(x)\) is the unknown function of interest, *Z*(*x*) is the realization of a stochastic process with the mean equal to zero, and a nonzero covariance, and \(\hat {f}\) is the known approximation function

where *φ* is the regression parameter matrix. The *i*,*j* − *t*
*h* element of the covariance matrix, *Z*(*x*), is given by

where *R*
_{
i
j
} is the correlation function between the *i*
^{th} and the *j*
^{th} data points; and \({\sigma _{z}^{2}}\) is the process variance, which scales the spatial correlation function. The popular types of correlation functions are listed in Table 5. The correlation function controls the smoothness of the Kriging model estimation, based on the influence of other nearby points on the point of intrest. In Kriging, the regression function coefficients, the process variance, and the correlation function parameters, \(\{\varphi ,{\sigma _{z}^{2}},\theta \}\), each can be predefined or estimated using parameter estimation methods such as Maximum Likelihood Estimation (MLE). In this paper, the regression function coefficients and the process variance are estimated using MLE, as given by

where *Y* = [*y*
_{1}
*y*
_{2}...] represents the vector of the actual output at the training points; *R* is a correlation matrix; and *F* is a matrix of *f*(*x*) evaluated by Kriging at each training point (Martin and Simpson 2005). The hyper-parameter, *𝜃*, in the correlation function is determined by solving the nonlinear hyper-parameter optimization problem. In this paper, Kriging model with first-order regression polynomial function is used. To estimate the regression function coefficients and the process variance in Kriging, the DACE (design and analysis of computer experiments) package, developed by Lophaven et al. (2002), is used in this paper.

### 1.3 Support vector regression (SVR)

Support Vector Regression (SVR) is a relatively newer regression type surrogate model. For a given training set of instance-label pairs (*x*
_{
i
}, *y*
_{
i
}), *i* = 1,...,*n*
_{
p
}, where *x*
_{
i
} ∈ *R*
^{n} and *y* ∈1,−1^{m}, a linear SVR is defined by *f*(*x*) =< *w*,*x* > + *b*, where *b* is a bias and < .,. > denotes the dot product. To train the SVR, the error, |*ξ*| = |*y* − *f*(*x*)|, is minimized by solving the following convex optimization problem:

In (14), *ε* ≥ 0 represents the difference between the actual and the predicted values; *ξ*
_{
i
} and \(\tilde {\xi }_{i}\) are the slack variables; *C* represents the flatness of the function; *n*
_{
p
} represents the number of training points. By applying kernel functions, *K*(*α*,*β*) =< *ϕ*(*α*),*ϕ*(*β*) > , under KKT conditions, the original problem is mapped into a higher dimensional space. The dual form of SVR for nonlinear regression could be represented as

The standard Kernel functions used in SVR are listed in Table 5. The performance of SVR depends on its penalty parameter *C* and kernel parameters *γ*, *r*, and *d*. Using hyper-parameter optimization, correlation parameters can be estimated such that it minimizes the model error. To implement SVR in this paper, the LIBSVM (A Library for Support Vector Machines) package (Chang and Lin 2011) is used.

### Appendix B: Analytical test functions

*Branin-Hoo function* (2 variables):

where *x*
_{1} ∈ [−510],*x*
_{2} ∈ [015].

*Hartmann function* (3 variables):

where *x* = (*x*
_{1}
*x*
_{2}…*x*
_{
n
})*x*
_{
i
} ∈ [01].

In this function, the number of variables, *n* = 3; the constants c, A, and P, are respectively a 1 × 4 vector, a 4 × 3 matrix, and a 4 × 3 matrix:

*Perm Function* (10 variables):

*Dixon & Price Function* (50 variables):

### Appendix C: Relationship between the error of the model selected by *COSMOS* and the size and distribution of the training data set

In this section, we explore the relationship between the error of the model selected by *COSMOS* and the size and distribution of the training data set for the Branin-Hoo benchmark function. A single model-kernel combination is considered here (RBF with Multiquadric function). A set of 200 paired training data set is randomly generated: X_{1}, X_{2}, X_{3},...,X_{200}. The size of each training data set is defined to be: {X_{1t
o40}} = 30, {X_{41t
o80}} = 60, {X_{81t
o120}} = 90, {X_{121 to 160}} = 120, {X_{161 to 200}} = 150. The distribution of samples is different (random) for sets with the same size. For each data set, COSMOS is applied to find the hyper-parameter value that minimizes the median error metric.

The median error of the selected model for the different data sets is illustrated in Fig. 21 as a series of boxplots, with each boxplot corresponding to sample sets of one given size. It is readily evident that error of the selected model is highly sensitive to the size of the training data sets. Although the influence of the training data distribution is also evident from the significant observed variance in the resulting model error, no particular trend is observed.

### Appendix D: Implementation of COSMOS on Hartmann and Perm functions

The final solutions, including the best trade-offs between the median and the maximum errors (in the *Φ*
_{0}, *Φ*
_{1}, and *Φ*
_{2} classes) in Hartmann Perm functions are illustrated in Figs. 22 and 23. For the Hartmann test function, *RBF with the Gaussian* and *Multiquadratic* basis functions and *Kriging* with the *Gaussian* correlation function under the *Φ*
_{1} class constitute the set of Pareto models in both *Cascaded* and *One-Step* techniques (Table 4). It is observed from Fig. 22 that in this problem, the Pareto solutions given by *COSMOS* for different Hyper-parameter classes have a larger spread than those given by the actual error. However, in terms of the best trade-off models, there is a fair agreement between the results of *COSMOS* and those determined from the actual errors (Table 4). Unlike the Branin-Hoo test function, in the Hartmann function, there is noticeable overlap between the final solutions from the different Hyper-parameter classes.

Figure 23 and Table 4 show that for the Perm test function, at least one model-kernel combination from each of the three classes (*Φ*
_{0}, *Φ*
_{1}, and *Φ*
_{2}) contribute to the Pareto optimal set. In this test problem, *Kriging* with *Linear correlation function* and *SVR* with *Sigmoid kernel function* are selected as the best models with the lowest median error and the lowest maximum error, respectively. It can be seen from Table 4 that there is promising agreement between the model-kernel combinations chosen by *COSMOS* and those chosen based on the actual error. From the COSMOS and the actual error results, we also observe that a Pareto solution from the *Φ*
_{0} class (*RBF-Linear*) is located at the elbow of the Pareto Frontier, which could be considered to represents a practically attractive, best trade-off model choice.

## Rights and permissions

## About this article

### Cite this article

Mehmani, A., Chowdhury, S., Meinrenken, C. *et al.* Concurrent surrogate model selection (COSMOS): optimizing model type, kernel function, and hyper-parameters.
*Struct Multidisc Optim* **57**, 1093–1114 (2018). https://doi.org/10.1007/s00158-017-1797-y

Received:

Revised:

Accepted:

Published:

Issue Date:

DOI: https://doi.org/10.1007/s00158-017-1797-y