# Performance evaluation of metamodelling methods for engineering problems: towards a practitioner guide

- 667 Downloads

## Abstract

Metamodelling or surrogate modelling techniques are frequently used across the engineering disciplines in conjunction with expensive simulation models or physical experiments. With the proliferation of metamodeling techniques developed to provide enhanced performance for specific problems, and the wide availability of a diverse choice of tools in engineering software packages, the engineering task of selecting a robust metamodeling technique for practical problems is still a challenge. This research introduces a framework for describing the typology of engineering problems, in terms of dimensionality and complexity, and the modelling conditions, reflecting the noisiness of the signals and the affordability of sample sizes, and on this basis presents a systematic evaluation of the performance of frequently used metamodeling techniques. A set of metamodeling techniques, selected based on their reported use for engineering problems (i.e. Polynomial, Radial Basis Function, and Kriging), were systematically evaluated in terms of accuracy and robustness against a carefully assembled set of 18 test functions covering different types of problems, sampling conditions and noise conditions. A set of four real-world engineering case studies covering both computer simulation and physical experiments were also analysed as validation tests for the proposed guidelines. The main conclusions drawn from the study are that Kriging model with Matérn 5/2 correlation function performs consistently well across different problem types with smooth (i.e. not noisy) data, while Kriging model with Matérn 3/2 correlation function provides robust performance under noisy conditions, except for the very high noise conditions, where the Kriging model with nugget appears to provide better models. These results provide engineering practitioners with a guide for the choice of a metamodeling technique for problem types and modelling conditions represented in the study, whereas the evaluation framework and benchmarking problems set will be useful for researchers conducting similar studies.

## Keywords

Metamodelling Kriging Radial basis functions Polynomials Correlation function Kernel functions Response surfaces## 1 Introduction

### 1.1 Background

High fidelity simulation modelling of complex engineered systems, based on tools such as finite element analysis (FEA) and computational fluid dynamics (CFD), plays an increasingly important role as they enable detailed analysis and optimisation of the design at an early stage, enabling significant cost and time compression in product development. However, complex engineering simulations are computationally expensive, which often precludes an efficient design exploration through parametric studies and optimisation, or the integration of multiple simulations in multi-physics models to study complex multidisciplinary systems. Metamodelling or surrogate modelling (Box and Draper 1987), underpinned by response surface modelling techniques originally introduced to develop prediction models for expensive physical experimental responses (Simpson et al. 2001b), is increasingly and extensively employed to replace complex simulation-based models across the engineering disciplines, (e.g. Fang et al. 2005; Zhu et al. 2009). Metamodelling techniques do not only provide relatively cheap and accurate response models to replace the expensive analysis tools, but also provide a filtering method to handle noisy data (Box and Draper 1987; Forrester et al. 2008).

Metamodelling techniques are commonly classified into parametric and non-parametric models (Rango et al. 2013). Parametric models (such as polynomials (Khan 2011) are explicitly dependent on the underlying model structure, whereas non-parametric methods (such as Radial Basis Function (RBF) (Khan 2011), Neural network (Hagan and Demuth 1999) and Kriging models (Sacks et al. 1989) do not require explicit model assumptions and use the experimental measurements to define the underlying relationship among the parameters.

Despite the availability of different software packages that enable the application of various metamodelling methods, the engineering task of selecting the most robust modelling technique, given the practical engineering problem, is still a challenge for the practitioner, especially when the system behaviour is unknown (Didcock et al. 2014). Several studies have been reported over the years evaluating the performance of metamodelling techniques; such studies have focussed either on identifying the most suitable type of model for a particular engineering application or modelling conditions (e.g. relating to the affordable sample size or uncertainty in the underpinning experiment) or, more general, the study of performance in relation to the type or characteristics of the problem (such as scale and nonlinearity). However, no study has systematically considered the complete typology of the modelling problems to drive the evaluation study of metamodelling techniques, and thus, a generic guideline for the engineering practitioners is still not available.

The research presented in this paper aims to address this issue, with a study of a selection of the metamodelling techniques frequently used in practice, systematically evaluated in a comprehensive experiment designed to replicate the characteristics of the metamodelling problems commonly encountered in product design and development of engineered systems in industries such as automotive and aerospace. Metamodelling problems associated with both physical experiments (such as steady-state engine calibration tests) and computer-based experiments (e.g. based on FEA, CFD, or multi-physics simulation experiments) are within the scope for this study. An analysis of literature relating to the evaluation of metamodelling techniques will be considered first in order to facilitate the identification of the key characteristics of engineering metamodelling problems, supporting the development of a coherent methodology for the study.

### 1.2 Review of related work

- (i)
The metamodelling methods considered in each study

- (ii)
The characteristics of the engineering problems considered in the study

The review and associated analysis are organised in chronological order to provide a view on both the use of metamodelling techniques in engineering as well as the expanding nature of problems tackled and issues.

Simpson et al. (1998) have analysed the performance of the Kriging method against Polynomials for a three-dimensional multidisciplinary optimisation problem of an aerospike nozzle design based on FE and CFD, arguing that Kriging based on a space-filling design can provide competitive modelling performance for the engineering design problem considered. Giunta et al. (1998) compared Polynomials with Kriging for three problems with different numbers of parameters (1, 5 and 10 parameters), based on mathematical benchmark problems. Over the specific set of problems considered, they observed that a quadratic polynomial consistently delivers highly accurate prediction models. Yang et al. (2000) compared four metamodelling techniques (i.e. Moving Least Square, Neural Network (NN), Stepwise Regression, and Multivariate Adaptive Regression Splines (MARS)) to predict a low-dimensional safety function in an automotive crash model. Varadarajan et al. (2000) compared the performance of Neural Networks and Polynomial methods for an engine combustion modelling application. The research by Jin et al. (2001) reports the first systematic comparative approach to study the performance of four metamodelling methods (i.e. Radial Basis Function (RBF), Kriging, Polynomial and Regression Splines), over different sampling sizes (i.e. scarce, small and large), and for different problem types in terms of scale and nonlinearity. Thirteen case studies were employed to develop a standard procedure to compare the metamodelling methods, including an extra noisy test function to briefly investigate how the performance of metamodels is affected by the noisy conditions, concluding that RBF with Gaussian kernel function performs the best across the case studies considered. However, the test functions considered are unduly biased towards only two problem dimensionality cases: small (with less than 4 variables) and large (with 10 and more parameters), not covering the medium size test functions. In a different study, Jin et al. (2003) compared the performance of Kriging, RBF and Polynomials for modelling a structural engineering problem, considering uncertainty in the values of parameters, over two sample sizes (small and large). They observed that the Kriging method with a Gaussian correlation matrix outperforms the other techniques and that the accuracy of metamodels was not affected by the sample size in the range considered (in particular for the RBF method). Also, the second-order polynomial was observed to be incapable of capturing the nonlinearity of the performance variations for the case study problem. Seabrook et al. (2003) compared NN, Kriging and RBFs for engine calibration experiments where the results are affected by experimental noise, concluding that Kriging offers the most robust modelling technique. Forsberg and Nilsson (2004) compared the performance of Kriging and Polynomial methods in modelling crashworthiness of a vehicle structure, for which Kriging was found to be more accurate. Fang et al. (2005) also compared RBF (using five different kernel functions) and polynomial techniques for a crashworthiness application. They reported that both modelling methods are accurate for the case study (especially the RBF with Multiquadric kernel function), while RBF outperforms Polynomial for smaller sampling sizes. In the comparative study presented by Mullur and Messac (2005), four metamodelling techniques were studied (Polynomials, RBF, Kriging and a RBF-based method named extended RBF) over 8 case studies selected from the ones used in the Jin et al. study. The study focused on the effects of different problem scales (small and large) and sample sizes (small, medium and large) using 2 sampling strategies (Latin Hypercube and random sampling); the extended RBF was found to outperform the other methods. Chen et al. (2006) studied the performance of RBF, Kriging, Polynomial, Neural Network and MARS methods over three case studies (two 2-dimensional test functions and one 10-dimensional engineering problem). For the limited number of case studies considered, they also investigated the effects of different sampling strategies. Ben-Ari et al. (2007) presented a comparative study of the performance of three metamodelling strategies (Splines, Kriging and projection pursuit regression) for three simulation-based case studies, concluding that while Kriging provides the best modelling technique, it is also the most computationally expensive. Kim et al. (2009) studied the performance of four metamodelling techniques (RBF, Kriging, Moving Least Square (MLS) and Support Vector Regression (SVR)) over six low-dimensional case studies, using two sample sizes (small and large). They reported that MLS and Kriging provided superior results compared to the other methods; however, these methods were found ineffective when the sample size is too small. In the study reported by Zhu et al. (2009), RBF, Kriging, SVR and NN methods were used to model a real-world engineering case study in the automotive engineering field, for which SVR outperformed the other techniques. Paiva et al. (2009) compared the performance of Kriging, quadratic polynomial and Neural Network over three aircraft applications. They observed that both Kriging and Neural Network delivered high fidelity models for large-scale problems. Zhao and Xue (2010) applied four metamodelling strategies (RBF, Kriging, linear polynomials and Bayesian neural network) to model six test functions (three small-scale and three large-scale problems), selected from Jin et al. (2001). This research also considered the effect of different sample sizes and the influence of different levels of noise conditions. In the study reported by Campean et al. (2010), Polynomial, Kriging and RBF (with different kernel functions including Thinplate, Multiquadric and Linear) metamodels were evaluated for engine fuel consumption and gaseous emissions modelling, based on noisy engine test data measurement (with different level of noise associated with measurement of different responses), concluding that Kriging outperforms the other modelling techniques. Li et al. (2010) have reported an evaluation of the performance of SVR against NN, RBF, Kriging and MARS for a range of benchmark engineering problems with between 2 and 8 variables, and also considering the effect homogeneous and heterogeneous stochastic error, induced via simulation. They have concluded that SVR performs best, closely matched by Kriging, while RBFs provide computational advantage for larger samples. They have also validated the performance of SVR on several simulation-based process optimisation case studies. Similarly, Wang et al. (2011) have compared the performance of SVR against polynomial regression, Kriging, RBF, NN and MARS for several nonlinear benchmark problems, concluding that SVR provides the most robust modelling technique. However, both studies relied on large sample sizes for the model fitting. Van Gelder et al. (2014) carried out a comparative study of metamodelling techniques (Polynomial, MARS, Kriging, RBF and NN) for large-scale building simulation, considering probabilistic input parameters, and the effect of the training sample size on the reliability of the metamodelling strategy. Their aim was to derive a guideline for practitioners and have concluded that NN and Kriging performed best, with Kriging requiring lower training set. They have however pointed out that Kriging models are harder to interpret. Liu et al. (2016) considered the impact of problem dimensionality (nine engineering benchmark problems with between 4 and 51 variables) and the space filling sampling strategy and size (between 10× and 40×) on the performance of metamodelling strategies (Polynomial, Kriging, RBF and RBF-HDMR—i.e. high dimensional model representation), concluding that RBF performed best. Kroetz et al. (2017) contrasted performance of Kriging models against Neural Networks and Polynomial Chaos Expansion (PCE) metamodelling techniques for a range of structural reliability problems and concluded that Kriging and NN models converge efficiently compared to PCE. Chen et al. (2019) studied the problem of efficiency of metamodelling in high dimensionality problems considering a set of nine test functions, the influence of sample size and presence of random noise, for a selection of metamodelling techniques including Kriging, RBF, SVR and a range of high dimensional model representation derivatives of these methods. They have found that Kriging (with Gaussian kernel) does not provide satisfactory global modelling results even with large sample sets. Østergård et al. (2018) evaluated the performance of six metamodelling techniques (Linear Regression, Random Forest (RF), SVR, MARS, Kriging, and NN) in relation to building performance simulation, focussing both on performance (accuracy, efficiency and robustness) and user-focussed metrics such as ease of use and interpretability. Their review considered 13 benchmark problems, with varying dimensionality and complexity (but not including “noisy” problems as these were not relevant to the application domain), and including both discrete and continuous variable problems. They have attempted to provide some guidance for metamodelling choice in relation to the level of expertise of the analyst (expert/ non-expert / automatic metamodelling) and the time/cost afforded for the experiment (large / limited / minimal). They have concluded that Kriging offers the most accurate metamodelling approach, although not the most computationally efficient, and providing models that are not easy to interpret by the user in the same way in which Polynomial regression models can be interpreted.

Summary of literature cases: metamodeling techniques and problem characteristics considered

Metamodelling technique | Problem characteristic | Test problem | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Polynomial | Kriging | RBF | MLS | NN | MARS | Splines | SVR | Other (e.g. HDMR) | Dimensionality | Nonlinearity | Sample size | Noise | Benchmark functions | Engineering problem | Engineering case study | |

Simpson et al. (1998) | x | x | x | x | Aerospike nozzle design | |||||||||||

Giunta and Watson (1998) | x | x | x | x | ||||||||||||

Yang et al. (2000) | x | x | x | x | x | x | Crashworthiness simulation | |||||||||

Varadarajan et al. (200) | x | x | x | Engine combustion modelling | ||||||||||||

Jin et al. (2001) | x | x | x | x | x | x | x | x | x | |||||||

Jin et al. (2003) | x | x | x | x | x | x | Structural design | |||||||||

Seabrook et al. (2003) | x | x | x | x | x | Engine mapping experiments | ||||||||||

Forsberg and Nilsson (2004) | x | x | x | Crash simulation | ||||||||||||

Fang et al. (2005) | x | x | x | x | Crash simulation | |||||||||||

Mullur and Messac (2006) | x | x | x | x | x | x | ||||||||||

Chen et al. (2006) | x | x | x | x | x | x | x | x | x | Wastewater treatment | ||||||

Ben-Ari and Steinberg (2007) | x | x | x | x | x | x | Piston dynamics, electric circuit | |||||||||

Kim et al. (2009) | x | x | x | x | x | x | ||||||||||

Zhu et al. (2009) | x | x | x | x | x | Automotive - body structure | ||||||||||

Paiva et al. (2009) | x | x | x | x | x | Aircraft wing design | ||||||||||

Zhao and Xue (2010) | x | x | x | x | x | x | x | x | ||||||||

x | x | x | x | x | Engine modelling experiments | |||||||||||

Li et al. (2010) | x | x | x | x | x | x | x | x | x | Process simulation | ||||||

Wang et al. (2011) | x | x | x | x | x | x | x | x | x | Crashworthiness optimisation | ||||||

Van Gelder et al. (2014) | x | x | x | x | x | x | x | Building simulation | ||||||||

Liu et al. (2016) | x | x | x | x | x | x | x | 9 engineering problems | ||||||||

Kroetz et al. (2017) | x | x | x | x | x | x | Structural reliability | |||||||||

x | x | x | x | x | x | x | x | x | ||||||||

Ostergard et al. (2018) | x | x | x | x |

In relation to computer-based experiments, the improvements in computation power and speed have enabled metamodelling studies for significantly increased dimensionality of problems and based on larger samples, which in turn supported the development and application of an increasingly diverse range of metamodeling techniques. Several studies compared the effect of different mathematical options for a particular modelling technique (such as correlation function options for Kriging (Toal and Keane 2013; Kleijnen and van Beers 2005; Ulaganathan et al. 2014, 2015) or kernel function options for RBF (Campean et al. 2010; Fang et al. 2005) over different applications. The impact of the hyperparameters associated with the modelling techniques and model selection criteria has been also increasingly considered in comparative studies—see for example Østergård et al. (2018). The continued research interest for comparative metamodelling studies over time reflects the fact that this problem is still of significant real-world impact. The fact that different practical test problems and applications areas appear to point to different optimal metamodelling choice conflicts the practical engineering interest, which usually revolves around prescriptive guidelines for a robust choice of a metamodelling technique for a specific type of engineering problem. Efforts to develop such guidelines are well illustrated by studies such as Van Gelder et al. (2014) and Østergård et al. (2018) for high dimensional problems of building performance simulations, or Blondet et al. (2019) for mechanical engineering simulations, underpinned by a knowledge-based system. Examples of generic recommendations for the choice of metamodelling techniques for physical experiments associated with engine tests are illustrated by the work of Seabrook et al. (2003) and Berger (2012).

Widely available commercial software, such as Matlab (e.g. the Model-Based Calibration toolbox), provide easy access for engineers to a very large choice of metamodeling techniques, including model fitting options (hyperparameters and model selection settings). However, this raises modelling productivity and effectiveness issues for practical application, for reasons discussed by Østergård et al. (2018), which justifies the need for further studies to support practical metamodeling choices of methods.

### 1.3 Research objectives and contribution

- 1.Characteristics of the problem type:
- (a)
*Dimensionality*or scale of the problem—typically indicated by the number of independent variables involved - (b)
Complexity of the problem defined in relation to the relationship among factors for approximation and usually quantified in terms of

*nonlinearity*, interaction between variables and importance of terms (Shan and Wang 2010)

- (a)
- 2.Modelling choices and conditions:
- (a)
The s

*ample size*practically afforded for the experiment - (b)
Uncertainty or

*noise*, relating to the experimental measurement uncertainty/noisiness

- (a)

The objective of the research presented in this paper was to carry out a systematic and comprehensive study of metamodelling techniques with the aim of developing a framework for the robust choice of method and model based on both the type of the engineering problem (scale and nonlinearity) and the modelling conditions or constraints (uncertainty and size of the metamodelling experiment). The scope of the study was set on three of the most popular metamodelling techniques in the engineering field, namely Polynomial, RBF and Kriging. This choice is justified both by the analysis of previous studies considered (these three methods are seen as the most popular in the analysis in Table 1), availability in common engineering software packages, and also the authors’ extensive experience and observation of practice in automotive product development. The key points observed in relation to practical engineering metamodelling problems, in particular early in product development, relate to (i) affordability of larger sample sizes for both computer-based experiments (including crash simulation, CFD and structural simulations, as well as increasingly used multi-physics simulations) and physical experiments (typified by the expensive engine calibration experiments) and (ii) uncertainty about the variables and parameters involved in the modelling experiments. While Neural Networks are also commonly used metamodelling techniques, most of the studies considered in Table 1 point to the fact that limitation with sample sizes severely affect the performance of NNs for engineering modelling problems in product development. For this reason, the NNs were not included in the study. While most of the recent research, in particular in relation to large-scale nonlinear metamodeling problems, points out the limitations of Polynomial regression models, from a practical point of view, they are still important because they have the advantage that as models are easier to interpret by the engineering analysts (van Gelder et al. 2014). They are still widely seen in practice as the first preference of engineers, and for this reason, they were included in this study.

- 1.
An extensive theoretical (i.e. based on known mathematical functions) study of the performance of metamodelling techniques across a range of problems of different scale and nonlinearity, and with consideration of the size of experimental training set for model fitting and sensitivity to noise in the response variables. In order to ensure the validity of the study, a set of 18 benchmark problems extracted from previous studies was assembled to provide a representative and balanced cover across the problem parameters. Based on the findings from this study, a guideline for preferred metamodelling technique based on problem criteria is proposed.

- 2.
A set of validation studies based on real-world engineering problems with experimental data from both physical and computer-based simulation experiments/tests. These studies are used as empirical validation for the proposed guideline.

- (i)
Presenting a comprehensive and complete evaluation study of common metamodeling techniques in relation to the characteristics of engineering problem types (dimensionality and complexity) and modelling conditions (experimental sample size and noisiness);

- (ii)
Defining a set of benchmarking problems with systematic coverage of criteria for metamodelling problems; and

- (iii)
Introducing a framework for guiding selection of metamodelling technique based on the problem classification criteria, with relevance to the typology of problems commonly seen in product design and development.

The impact of the paper to the research community is defined by the first two points, while the latter point should provide the basis for the impact of the study to practitioners in the engineering design, modelling and simulation community.

The organisation of the paper is as follows: first, the mathematical properties of the three metamodelling techniques considered in this study are outlined, followed by a detailed description of the research methodology, including the benchmark test functions and the procedure to evaluate the properties of the constructed models. The results from different modelling techniques are presented next and analysed for different problem types and modelling conditions to provide a sound guideline to choose the best metamodelling technique for different classes of problems. The proposed guideline is validated on four real-life engineering problems. The paper ends with a discussion of the results and directions for further work.

## 2 Metamodelling techniques

Three types of metamodelling techniques are included in this study: Polynomial, Radial Basis Function and Kriging. The principal features of these techniques and the corresponding mathematical representations are described in the following sections.

### 2.1 Polynomial

Polynomials have been frequently used by engineers to predict the behaviour of complex engineering systems (Lach et al. 2007; Schlober et al. 2007). The general form of a polynomial model can be expressed as (Morris and Mitchell 1995; Myers et al. 1989):

*y*(

*x*) predicts the response value at design parameters

*x*by linear combination of a set of base functions

*f*. In this equation,

*P*indicates the number of base functions corresponding to a polynomial order, and

*a*defines the regression coefficients for each of the base functions. These coefficients are usually retrieved through a linear regression based on least squares estimation (Chen et al. 2006). The main reasons for the popularity of polynomials compared to the non-parametric metamodelling techniques are:

Polynomials have a simple structure, which makes them easy to understand and manipulate (Hartmann et al. 2013).

Polynomials require low computational effort due to linearity in the unknown parameters (Rango et al. 2013).

Low-order polynomials show less possibility of over-fitting (i.e. better smoothing capability (Jin et al. 2001)), which is particularly important for modelling noisy measurements (Khan 2011).

Regardless of these advantages, there are some important limitations in using polynomial metamodels for complex responses (i.e. highly nonlinear or large-scale problems) since the number of model terms increases significantly, increasing the number of experimental measurements required to calculate the coefficients (Khan, 2011). In this research, the second- and third-order polynomials were used, while the maximum order of parameter interactions was limited to 2.

### 2.2 Radial Basis Function (RBF)

Radial Basis Function is a non-parametric metamodelling technique which was introduced as a variant of Neural Network models in the late 1980s (Chen et al. 1996). This interpolation-based modelling method uses linear combinations of radially symmetric functions (called kernel functions) (Jin et al. 2001). A general form of a RBF model can be described by (Khan 2011):

*x*^{i} is the *i*’th measured point in the training data, *n* is the number of measured points, *Ф*() indicates the kernel function (i.e. the shape of the radially symmetric function), *w*_{i} is the weight given to the *i*’th kernel function, *t*^{i} is the measured output at *x*^{i} and *ǁx*^{i} − *x*^{j}*ǁ* is the distance between the measured points *i* and *j*. RBF models can use different kernel functions, such as Gaussian, Multiquadric, Thinplate, Spline, Cubic and Linear functions (Khan 2011).

RBF can learn the complex and nonlinear relationship between the input and output parameters (Hagan and Demuth 1999).

RBF can confront missing and noisy data with a good generalisation capability (Hagan and Demuth 1999).

RBF models can be more efficient (in relation to the number of measurements required) than polynomials for large-scale nonlinear problems, since the approximation model is purely driven from the collected data (through learning), rather than assuming a fixed model type in advance (He and Rutland 2004).

RBF is very fast in learning the relationship between the input and output variables because of using two-stage network training. The first stage is to determine the weights from the ‘Input’ to the ‘Hidden’ layer, and the second stage from the ‘Hidden’ layer to the ‘Output’ layer (Khan 2011).

Although the RBF method has proved to be an efficient modelling technique for complex problems, such as modelling of internal combustion engines (Howlett et al. 1999), there is always the possibility of over-fitting (Forrester et al. 2008), particularly for noisy measurements.

In this paper, the performance of the RBF method is investigated by applying three of the frequently used kernel functions: Gaussian, Multiquadric and Thinplate (Kim et al. 2009; Rango et al. 2013).

### 2.3 Kriging

The theoretical basis of the Kriging modelling was developed by Matheron (Matheron, 1976), based on the original work by Krige (Forrester et al., 2008). Kriging metamodels are derived from (3) (Sacks et al. 1989; Simpson et al. 2001a), which postulates a combination of a global model (*f*) and localised departures (*G*).

*y*is predicted at a point

*x*using a global approximation function

*f*, which is usually a polynomial, and ‘localized’ deviations

*G*, which are calculated by interpolation of the measured sample points.

*G*(

*x*) denotes the realisation of a weakly stationary stochastic process with mean 0, process variance

*σ*

^{2}and nonzero covariance function, as given by (4) (Khan, 2011):

*R*(*x*^{i}, *x*^{j}) represents the correlation between any two of the measured sample points, *x*^{i} and *x*^{j}. *R* is assumed to be a function of a small set of parameters, which are estimated based on the Likelihood function (Forrester et al. 2008). The Likelihood function defines the probability of the measured outputs, given a specific set of parameter values (Cressie 1990).

A variety of correlation functions are available in literature to fit a Kriging model (Couckuyt et al. 2012; Fang et al. 2005; Kaymaz 2005; Kersaudy et al. 2015; Kleijnen 2009; Kleijnen and van Beers 2005; Moyeed and Papritz 2002; Passos et al. 2015; Picheny et al. 2013a; Ulaganathan et al. 2014, 2015; Wang and Shan 2005). For this study, a selection of commonly used correlation functions, i.e. Gaussian (de Oliveira et al. 2013; Kleijnen 2009; Passos et al. 2015; Picheny et al. 2010), Matérn 3/2 (Picheny et al. 2013a; Ulaganathan et al. 2014, 2015) and Matérn 5/2 (Ulaganathan et al. 2014, 2015), is considered for the study of the performance of the Kriging metamodels. The common choice for the functional form for the global function *f* (see (3)) which gives the overall trend is a polynomial (linear or quadratic). However, for this study, a constant term (zero order polynomial) has been chosen, similar to the approach commonly used in engineering practice—see for example Forsberg and Nilsson (2004) and van Gelder (2014). This approach is referred to as ordinary Kriging, a particular case of universal Kriging—which uses a polynomial global trend function.

_{e}), which measures the amount of random variation over the

*n*-dimensional design space, through a so-called nugget effect (Kleijnen, 2009; Kleijnen and van Beers 2005; Picheny et al. 2013a; Chen et al. 2018, 2019). In a ‘nugget effect’ Kriging the nonzero covariance function can be defined as:

where *I* is an *n × n* identity matrix. The performance of Kriging with a nugget effect is not well documented in the existing literature.

Kriging models are highly flexible due to the wide range of correlation functions.

Kriging models require fewer measurements due to the strong interpolation among the measured sample points.

Kriging models can either ‘honour the data’ by providing an exact interpolation of the data, or ‘smooth the data’ by providing an inexact interpolation.

A potential pitfall of using Gaussian Kriging models is the *curse of over-fitting* (Forrester et al., 2008), particularly with noisy sets of measurements. Also, implementation of Kriging models is a relatively time-consuming process since determining the maximum likelihood parameters is a complex optimisation problem (Khan, 2011).

It is noteworthy that the performance of Kriging can be endangered by the type of designed experiment used to collect the experimental data. This method has shown some difficulty in fitting response models for Full Factorial and Central Composite Design of Experiments (DoE) methods (Meckesheimer et al., 2001), since the correlation matrix becomes almost singular when multiple sample points are located close to each other.

## 3 Research methodology

### 3.1 Test functions and test cases

To investigate the performance of different metamodelling strategies, 18 test functions were selected from literature to be representative of the range of dimensionality and complexity (nonlinearity) seen across the different engineering problems discussed in related literature, and seen by the authors in engineering practice. The mathematical definitions of selected test functions are listed in Appendix A.

Experimental matrix for the study—summary of test functions and their characteristics

Test function | No of variables ( | Problem type | Modelling conditions | ||||
---|---|---|---|---|---|---|---|

Scale | Nonlinearity | DoE size (test points) | Noise | ||||

MBs (10 | MBl (30 | Validation | |||||

Test 1 | 2 | Small | Low | 20 | 60 | 1000 | 0–15% |

Test 2 | 3 | Small | Low | 30 | 90 | 1000 | 0–15% |

Test 3 | 4 | Small | Low | 40 | 120 | 1000 | 0–15% |

Test 4 | 2 | Small | High | 20 | 60 | 1000 | 0–15% |

Test 5 | 3 | Small | High | 30 | 90 | 1000 | 0–15% |

Test 6 | 4 | Small | High | 40 | 120 | 1000 | 0–15% |

Test 7 | 5 | Medium | Low | 50 | 150 | 1500 | 0–15% |

Test 8 | 6 | Medium | Low | 60 | 180 | 1500 | 0–15% |

Test 9 | 8 | Medium | Low | 80 | 240 | 1500 | 0–15% |

Test 10 | 6 | Medium | High | 60 | 180 | 1500 | 0–15% |

Test 11 | 7 | Medium | High | 70 | 210 | 1500 | 0–15% |

Test 12 | 8 | Medium | High | 80 | 240 | 1500 | 0–15% |

Test 13 | 10 | Large | Low | 100 | 300 | 2000 | 0–15% |

Test 14 | 16 | Large | Low | 160 | 480 | 2000 | 0–15% |

Test 15 | 20 | Large | Low | 200 | 600 | 2000 | 0–15% |

Test 16 | 10 | Large | High | 100 | 300 | 2000 | 0–15% |

Test 17 | 14 | Large | High | 140 | 420 | 2000 | 0–15% |

Test 18 | 18 | Large | High | 180 | 540 | 2000 | 0–15% |

*k*), the 18 functions can be classified into three categories:

Small-scale (

*k*≤ 4) (6 test functions)Medium-scale (5 ≤

*k*≤ 8) (6 test functions)Large-scale (

*k*≥ 9) (6 test functions)

Low-order nonlinear problems (9 test functions)

High-order nonlinear problems (9 test functions)

For the classification of complexity, the test functions for which the value of *R*^{2} is more than 0.99, when constructing a second-order polynomial metamodel, are considered low-order nonlinear, otherwise they are considered as high-order nonlinear (Jin et al. 2001; Mullur and Messac 2005; Zhao and Xue 2010).

Small DoE sample size (denoted as MBs in Table 2): an Optimal Latin Hypercube (OLH) design with a number of test points ‘10×number of parameters’ (Jin et al. 2001; Loeppky et al. 2009).

Large DoE sample size (MBl in Table 2): an OLH design a with number of test points ‘30×number of parameters’ (Jin et al. 2001; Mullur and Messac 2005; Zhao and Xue 2010).

The main reason of using an OLH design to generate the test points is the ability of this sampling strategy to deliver uniformly distributed test points within the range of parameters and also the flexibility on the number of test points (Fang et al. 2010; Sacks et al. 1989). In this paper, a Permutation Genetic Algorithm is used to generate the Optimal Latin Hypercube test point based on the Audze-Eglais objective function (Kianifar et al. 2015).

The validation sets designed to check the metamodels’ prediction accuracy were also OLH DoE. Two different validation DoE sizes were considered, ranging between 1000 and 2000 validation tests, based on the problem dimensionality, as shown in Table 2.

Base test—‘0%’ noise condition (or smooth data);

Repeat 1—‘5%’ noise condition;

Repeat 2—‘10%’ noise condition;

Repeat 3—‘15%’ noise condition.

The choice of these level of noise was based on the comprehensive experimental studies such as those reported in Berger (2012) and Kianifar et al. (2014), which systematically considered the experimental variability associated with modelling different engine responses.

### 3.2 Selection of Metamodelling techniques

Two polynomial metamodels: order 2 (‘Poly 2’) and three (‘Poly 3_c2’—i.e. cubic polynomial with interaction order 2);

Three RBF metamodels: using Gaussian, Thinplate and Multiquadric kernel functions (shown as ‘RBF_G’, ‘RBF_TP’ and ‘RBF_MQ’);

Three Kriging metamodels: using Gaussian (‘Krig_G’), Matérn 3/2 (‘Krig_M32’) and Matérn 5/2 (‘Krig_M52’) correlation functions;

Three Kriging metamodels with nugget factor, using Gaussian (‘KrigN_G’), Matérn 3/2 (‘KrigN_M32’) and Matérn 5/2 (‘KrigN_M52’) correlation functions;

Thus, the base experiments included a total of 11 metamodels fitted for each of the 18 test functions considered. In order to account for the impact of the modelling conditions (sample size and noisiness of the response), the base experiment was repeated for “Small” and “Large” DoE sizes and different noise conditions (5%, 10% and 15%, in addition to the base experiments which considered a smooth signal—0% noise).

### 3.3 Performance metrics and analysis procedure

NRMSE is the normalised discrepancy between the real values (*y*) of the *v* validation sample points and the corresponding prediction values (\( \hat{y} \)) (Hartmann et al. 2013). The smaller the value of NRMSE, the more accurately the metamodel predicts.

*R*

^{2}(Ben-Ari and Steinberg 2007; Fang et al. 2005; Jin et al. 2003; Vicente-Serrano et al. 2003), is that non-parametric metamodels are susceptible to over-fitting, especially in noisy data conditions. Thus, by measuring the validation NRMSE, an evaluation is provided for the capability of a metamodel to predict the response behaviour over the design range rather than the ability to construct a model which follows the sample points (Fang et al. 2005; Vicente-Serrano et al. 2003).

- (b)
The variance of NRMSEs for a problem type; this provides an indicator of modelling robustness, coherent with Jin et al. (2001) and Chen et al. (2018, 2019), which relates to the capability of achieving good accuracy for different problems, thus providing an indicator whether a modelling technique is highly problem dependent. In the context of this study, the variance of NRMSE was calculated across the test-problem clusters defined through the experimental matrix, illustrated in Table 2. A smaller variance of NRMSE value indicates consistent performance of the methodology across the type of problem, which can be taken as an indication of robustness.

- (c)
The computational effort was taken as an indicator of efficiency. The required time to construct a metamodel is used to define the computational efficiency of different metamodelling techniques for each of the problem types.

### 3.4 Validation test cases

Summary of the validation experiments

Type of experiment | Test problem | Problem characteristics |
---|---|---|

Physical | V1: Modelling of fuel consumption response for a diesel engine based on dynamometer testing data (Kianifar et al. 2014) | •Medium scale •High-order nonlinearity • • |

V2: Modelling of NOx response for a diesel engine based on dynamometer testing data (Kianifar et al. 2014) | •Medium scale •High-order nonlinearity • • | |

Computer-based/CAE | V3: Modelling the head injury criteria (HIC) response for a tall man during a car-pedestrian impact based on dynamic CAE experiments (Zhao et al. 2010) | •Medium scale •High-order nonlinearity • • |

V4: Modelling of contaminant concentrations response in the upper aquifer of a waste disposal site in Moscow based on numerical simulation data (Marrel et al. 2008; Volkova et al. 2006). | •Large scale •High-order nonlinearity • • |

## 4 Results and analysis

Based on the designed research methodology, for each of the 18 case studies listed in Table 2, 11 model types are created under four noise conditions and two sampling sizes. Thus, a total of 1584 metamodels were constructed.

- 1.
Small sample DoE

- 2.
Large sample DoE

- 3.
Sensitivity to noise

The efficiency of the metamodelling techniques (in terms of computation cost) will be also evaluated, before articulating a set of practical guidelines for the choice of metamodelling techniques.

### 4.1 Evaluation of metamodelling techniques for small sample DoEs

The results in Fig. 1 show that for small DoE sizes, Kriging without a nugget factor outperforms other modelling strategies in terms of robustness—judged based on the average and range of NRMSE across the fitted models and the variance of NRMSE. In particular, Kriging with the Matérn 5/2 correlation function (shown as ‘Krig_M52’) performs slightly better than other correlation functions. It should be noted that the variance of NRMSE for all the modelling strategies is within an acceptable range (less than 0.05). Polynomial models can have very good performance (in some cases NMRSE is 0), but their robustness is quite poor—as shown by the large range and variance of NRMSE. RBFs do not perform very well with small sample sizes; the Gaussian kernel appears to provide more robustness, but the worst accuracy (highest average NRMSE).

In order to better understand the behaviour of different metamodelling techniques, a deeper analysis is provided by evaluating the performance in relation to the problem characteristics—nonlinearity and scale.

Therefore, Kriging modelling technique with Matérn 5/2 correlation function is shown to be the best method in terms of both accuracy and robustness over all types of problems for small sampling sizes.

Polynomial models can deliver good models (the NRMSE whiskers extend to 0), but their robustness is worse than most other models. In particular, the cubic polynomial model shows a very wide NRMSE range for the large-scale test functions, which can be explained by the number of coefficients needed for the larger scale test problems in relation to the DoE size.

The performance of the RBF models improves relative to polynomials for the large-scale problems, in particular in terms of robustness.

Kriging with nugget metamodelling delivers robust performance, but inferior to the standard Kriging in terms of accuracy (average NRMSE), which is not surprising given that there is no noise in the response.

The results in Fig. 7 confirm that for low nonlinearity problems and small sample DoEs, the polynomials (in particular the cubic polynomials) deliver the best average NRMSE performance. The Kriging models (without nugget) provide very good models with NRMSE below 0.05 in all cases. RBFs can provide competitive NRMSE performance, but not in all cases, so not a robust choice. For high-nonlinearity problems, the results in Fig. 8 show that the Kriging method with both Matérn 3/2 and Matérn 5/2 correlation functions perform the best in terms of average NRMSE across the test functions considered.

It is worth reflecting that if the problem scale is large and the sample size is small, using a non-parametric model like Kriging can be preferable since it uses fewer model parameters (i.e. while the Polynomial will require many model coefficients in relation to the Poly order). In other words, the performance of Polynomials is very ‘problem-type’ and ‘sample-size’ dependent (Jin et al., 2001).

### 4.2 Evaluation of metamodelling techniques for large sample DoEs

The analysis of the results for the large sample DoE follows the same sequence as the analysis of small sample DoE results presented in the previous section.

Therefore, Kriging modelling technique with Matérn 5/2 correlation function is shown to be the best method in terms of both accuracy and robustness over all types of problems for not only the small but also large sampling sizes.

*regardless of problem scale*, similar to what was seen for small DoE sizes. Kriging models with Matérn 3/2 correlation function are also competitive.

### 4.3 Sensitivity to noise

In order to investigate the effects of noise in the response variable measurement on the performance of metamodelling techniques, in terms of both accuracy and robustness, all the 18 test case functions in Table 1 were modelled under three noise conditions, i.e. 5%, 10% and 15% Gaussian random noise artificially added to the response data. The whole set of experiments (including both small and large DoE sampling size) was repeated with the three levels of noise.

The performance of the Kriging with nugget metamodelling technique is remarkably quite insensitive to noise levels within the range considered in the study. However, the average NRMSE of Kriging with nugget (with Matern 5/2 correlation function) outperforms the standard Kriging under highest levels of noise (15%) and with a large DoE sample size.

As expected, a general trend of worsening NRMSE with higher noise is observed across the metamodelling techniques, with the exception of Kriging with nugget. The polynomial models also show good stability in relation to increase noise for large DoE sample sizes.

It is also seen that the accuracy of modelling improves with the large DoE sample size, except for the RBF with Gaussian Kernel. In general, RBF metamodels show the greatest sensitivity to noise.

### 4.4 Evaluation of computational efficiency of Metamodelling techniques

The efficiency of metamodelling techniques was evaluated by the time required to fit a model on an Intel i5, 2.60 GHz computer (with 8GB RAM), as summarised in Table 2 for each problem scale and sample size. This table indicates that fitting a model using either Polynomial or RBF methods is quite fast regardless of the problem dimensionality (i.e. took less than 1 s to fit a model). Therefore, relatively speaking, while Polynomial is observed to be the most time-efficient method, Kriging method is relatively very time-consuming. This result was expected since the Kriging modelling technique without nugget factor requires a k-dimensional optimisation, regardless of the correlation function, to find the optimum value of the maximum likelihood function parameters. The optimisation problem is even more complex for Kriging with nugget factor which requires a (k + 1)-dimensional optimisation problem, having an extra parameter to optimise for the nugget.

Time needed (in second) to fit a model

DoE size | ||||||
---|---|---|---|---|---|---|

MBs | MBs | MBs | MBl | MBl | MBl | |

Model/scale | Small | Medium | Large | Small | Medium | Large |

Poly 2 | 0.01 | 0.01 | 0.46 | 0.00 | 0.02 | 0.20 |

Poly 3_c2 | 0.00 | 0.00 | 0.05 | 0.00 | 0.15 | 0.43 |

RBF_G | 0.00 | 0.00 | 0.06 | 0.01 | 0.01 | 0.20 |

RBF_TP | 0.00 | 0.00 | 0.02 | 0.01 | 0.02 | 0.09 |

RBF_MQ | 0.00 | 0.00 | 0.02 | 0.00 | 0.03 | 0.06 |

Krig_G | 1.81 | 8.61 | 102.22 | 5.88 | 55.03 | 653.34 |

Krig_M32 | 1.96 | 8.85 | 131.79 | 8.43 | 82.31 | 645.37 |

Krig_M52 | 1.92 | 8.09 | 109.66 | 7.12 | 48.38 | 1365.52 |

KrigN_G | 1.82 | 7.39 | 58.17 | 6.59 | 56.17 | 724.76 |

KrigN_M32 | 1.50 | 5.44 | 71.21 | 6.70 | 50.34 | 603.08 |

KrigN_M52 | 1.81 | 8.08 | 109.49 | 5.24 | 55.88 | 781.47 |

### 4.5 Overall guideline for choosing a robust metamodelling method

Kriging with Matérn 5/2 correlation function outperforms all the other modelling methods, regardless of problem scale and DoE size.

Kriging with Matérn 5/2 correlation function performs best for highly nonlinear functions, regardless of the problem scale and DoE size, while also performing reasonably accurate for low nonlinear problems. Polynomials (especially Cubic models) are also providing competitive results to the Krig_M52 method for low-order non-linear functions, however; a Polynomial might require a larger DoE sample to calculate the model coefficients.

*noisy data*:

Kriging with Matérn 3/2 correlation function excels other modelling techniques in terms of accuracy and robustness.

Kriging with nugget factor (Matérn 5/2 correlation function) can be a better modelling option for highly noisy data (15% noise) with large DoE samples.

## 5 Validation case studies

In order to provide empirical validation for the guidelines provided in section 4.5, a modelling analysis was performed for four real-world engineering test problems, as outlined in the methodology section 3.4. For each of these four validation test case problems, all the considered metamodelling techniques were deployed, and the performance of the techniques measured in terms of NRMSE of validation test points was analysed. The results are discussed in the following sections.

### 5.1 Diesel engine fuel consumption

The aim of this case study was to predict the fuel consumption response of a Diesel engine on an engine calibration test. The test data was collected using 127 test points scheduled based on an adaptive D-Optimal DoE, covering the range of 6 design parameters, with 15 additional space-filling test points for validation (for more details see (Kianifar et al. 2014)).

Medium scale: 6 design parameters;

Medium- to high-order nonlinearity: due to the expected physical behaviour of the fuel consumption response (Kianifar et al. 2014);

Medium DoE sample: 127 Model Building points for 6 parameters, which is bigger than a small DoE of 60 (i.e. 10 × 6) points, but smaller that the “large” DoE (i.e. 30 × 6 = 180);

Low noise condition (< 5%): measurement of fuel flow for a Diesel engine on an engine dynamometer test setting is reasonably accurate—though not 0% noise.

This result is consistent with the guideline provided—which would have suggested the Kriging model with Matérn 3/2 correlation function—given the presence of noise.

### 5.2 Diesel engine NOx emissions

This test case aims to predict the NOx emission response of a Diesel engine, based on data collected from the same engine dynamometer testing experiment as described in the previous section. The main difference is that measurement of NOx is characterised by higher uncertainty—hence higher noise modelling conditions. Accordingly, this problem type is also a: medium-scale, high-order nonlinear with large sample size, but it is under 15% noise condition. The accuracy of metamodelling techniques is again compared based on the NRMSE of the 15 space-filling validation test points.

### 5.3 Metamodelling for crashworthiness/pedestrian impact simulation

This case study aims to use an efficient metamodelling technique to model the head injury criteria as simulation response. The study focusses on the vehicle impact event simulation with a tall male pedestrian, based on a Optimal Latin Hypercube design of 100 test points; 70 points for constructing the models and 30 points for validation, over the range of 7 design parameters (for more details see (Zhao et al. 2010)).

Medium scale: with 7 design parameters

High-order nonlinear: due to the highly nonlinear behaviour of head injury response in relation to the design parameters (Zhao et al. 2010)

Small sampling data: 70 points for 7 parameters (i.e. 10 × 7)

About 10% noise condition, due to the applied error range during the simulation process

### 5.4 Metamodelling for the MARTHE simulation data

Large scale: with 20 design parameters

High-order nonlinear: due to the high-order nonlinear behaviour of the response (Volkova et al. 2006)

Small sampling data: 200 points for 20 parameters (i.e. 10 × 20)

0% noise condition: collected from the MARTHE simulation code

## 6 Discussion, conclusions and future work

The main aim of this study was to carry out a comprehensive and systematic evaluation of the performance of several metamodelling techniques frequently utilised by engineers, considering the characteristics of the problems and the modelling choices and conditions—i.e. sample size and noise conditions. The motivation for this research was given by the need to provide guidelines for engineers who might seek to select a robust metamodelling strategy (from a diverse choice available) for real-world engineering problems.

The paper introduced a framework for the study of the metamodeling techniques designed to provide a balanced evaluation against all the characteristics that define the metamodeling typology—dimensionality, nonlinearity, sample size and noise. The careful selection of synthetic benchmark problems/test functions ensured a balanced set is available for each problem type. The systematic consideration of noise as a factor in metamodeling is important because it projects the validity of the study onto both physical and computer-based experiments.

The choice of techniques to be included in the evaluation was driven by the review of related studies from literature, and the scope/focus of the study on providing guidance for practical product development engineering metamodeling problems, where affordability of large sample sizes for the experiments to generate the data for model building is often a limiting factor. Therefore, a range of techniques that in literature have shown promising results in relation to high dimensional problems and large sample sizes (e.g. HDMR) have not been included in this study. The study was also limited in relation to the selection of correlation functions for Kriging and kernels for RBF, confined to the choice of a set that was seen from previous studies to give consistently good results for engineering problems. For the same reason, we have not considered global trend functions for either Kriging (i.e. Universal Kriging) or RBF (i.e. hybrid RBF models). We have also not considered in this study the effect of hyperparameters that can be used to further improve the fit that can be achieved with a specific metamodeling technique. The main reason for this rests with engineering practice considerations: real-world modelling problems often require many models to be developed as part of a larger modelling exercise. For example, modelling engine performance—either through physical tests or computer-based experiments (for example using CFD models for combustion) requires many models (10 s and often 100 s) to be developed at different conditions (e.g. based on data collected at a range of engine speed/load setpoints). Through hyperparameters tuning and choosing from a wider pool of correlation functions or kernels, slightly better models fits could perhaps be achieved for each individual data set. However, this might mean that overall we end up with 100 s of different model structures—for the same fundamental engineering problem modelled. This will not only be confusing for the analyst, but will raise issues of confidence in the robustness of the models and modelling, as well as the interpretability of the models. Such issues have been also discussed by other researchers that sought to develop guidelines for the practitioners, see for example Van Gelder et al. (2014). Therefore, our strategy has been to focus on the development of a consistent framework for describing engineering modelling problem types, and to design a systematic experiment to evaluate a set of candidates that have already been proven as good modelling choices, leading to practitioner guidelines for choosing a metamodeling technique for their specific problem type.

Under 0% noise condition, Kriging with Matérn 5/2 correlation function outperforms the other methods for high-order nonlinear problems; however, for low-order problems, Polynomials are providing competitive response models in terms of accuracy, and have the advantage that are easier to interpret;

Problem scale has an insignificant effect on the performance of modelling techniques in terms of accuracy, and Kriging with both Matérn 5/2 and Matérn 3/2 correlation functions provide highly accurate response models;

Increasing the number of sampling points usually results in an enhancement in the performance of modelling techniques, in terms of accuracy and robustness; however, this is also dependent on the noise condition;

For both small and large DoE sample sizes, Kriging with Matérn 5/2 correlation function excels in terms of accuracy and robustness;

For engineering problems under noise condition, Kriging modelling technique with Matérn 3/2 correlation function performs better than other techniques; however, its performance might deteriorate under very high-noise conditions (i.e. 15% noise or more), where using a Kriging model with nugget factor might provide better models;

In terms of computation-efficiency, both Polynomials and RBFs take a trivial amount of time to construct the models, while the Kriging can be very time-consuming, in particular for large samples; however, from a practical engineering point of view, the cost of computation is perhaps less significant compared to the benefits gained from a better quality model.

These findings, based on the results from the evaluation against the 18 test problems, have supported the development of a set of modelling guidelines for engineers, which in essence can be summarised as follows: (i) for engineering problems with smooth-data (i.e. 0% noise), Kriging with Matérn 5/2 correlation function is the most dependable metamodeling method, and (ii) when experimental noise conditions are significant, Kriging with Matérn 3/2 correlation function provides a robust metamodeling choice. The capability of Kriging with the Matérn class of correlation functions is due to the flexibility and realistic smoothness assumptions in modelling many physical processes (Stein 2012). Within our study, the proposed guidelines have been validated with four engineering case studies, which showed that the suggested modelling technique based on the problem type confirmed the performance expectations.

The framework developed in this paper should provide a basis for further research on evaluation of metamodeling techniques. The structure of the experiment, including the balanced set of benchmark functions structured on problem types, provides a good testbed for further studies, which could include a broader set of metamodelling techniques as well as the effect of hyperparameters. The role of dynamic approaches for optimal metamodel selection, e.g. as discussed by Zhao et al. (2011), could also be evaluated within the context of this framework.

Most importantly, we see the impact of this paper onto the engineering practitioners dealing with modelling problems. The principle of the guidelines developed in this paper is to identify a robust metamodeling choice for a problem type. This encourages the engineer to start by analysing the problem in relation to its characteristics, firstly in relation to dimensionality and complexity (either nonlinearity, or in terms of interactions and importance of terms, which could be evaluated through smaller screening experiments), and then in relation to the sample size affordability and the level of noise affecting the variables and the experiment (which could be assessed with standard repeatability and reproducibility tests, which are common practice in engineering). This should not only provide a more robust basis for metamodeling, but also increase the confidence and ultimately the take-up of metamodeling in engineering practice.

## 7 Replication of results

The test functions are listed in the Appendix, and the hyperparameters of the metamodelling techniques implemented in the paper are discussed in Section 3; therefore, the results should be fully reproducible by other researchers. The implementation was in Matlab, and the scripts can be made available by request to interested researchers via the corresponding author.

## Notes

### Compliance with ethical standards

### Conflict of interest

The authors declare that they have no conflict of interest.

## References

- An J, Owen A (2001) Quasi-regression. J Complex 17(4):588–607MathSciNetzbMATHGoogle Scholar
- Bäck T (1996) Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford University PressGoogle Scholar
- Ben-Ari EN and Steinberg DM (2007) Modeling Data from Computer Experiments: An Empirical Comparison of Kriging with MARS and Projection Pursuit Regression, Quality Engineering, 19:4, 327-338, https://doi.org/10.1080/08982110701580930 Google Scholar
- Berger B (2012) Modelling and optimisation for Stationary Base engine calibration, PhD thesis, TU MunichGoogle Scholar
- Blondet G, Duigou J Le and Boudaoud N (2019) A knowledge-based system for numerical design of experiments processes in mechanical engineering. Expert Syst Appl 122. Pergamon: 289–302Google Scholar
- Box GEP, Draper NR (1987) Empirical model-building and response surfaces. John Wiley & Sons, Oxford, EnglandzbMATHGoogle Scholar
- Campean F, Ananta K, Yin XF, Grove D (2010) Evaluation of modelling techniques for engine response models, proc Int congress on automotive and transport engineering CONAT 2010, vol 2, pp 95–102 ISSN 2069-0428Google Scholar
- Chen S, Chng ES and Alkadhimi K (1996) Regularized orthogonal least squares algorithm for constructing radial basis function networks. International journal of control 64(5). Taylor & Francis: 829–837Google Scholar
- Chen VCP, Tsui K-L, Barton RR, et al. (2006) A review on design, modeling and applications of computer experiments. IIE transactions 38(4). Taylor & Francis Group: 273–291Google Scholar
- Chen L, Qiu H, Jiang C, et al. (2018) Support vector enhanced kriging for metamodeling with noisy data. Struct Multidiscip Optim 57(4):1611–1623Google Scholar
- Chen, L., Wang, H., Ye, F., & Hu, W. (2019). Comparative study of HDMRs and other popular metamodeling techniques for high dimensional problems. Struct Multidiscip Optim 59(1):21–42Google Scholar
- Couckuyt I, Forrester A, Gorissen D et al (2012) Blind kriging: implementation and performance analysis. Adv Eng Softw 49:1–13. https://doi.org/10.1016/j.advengsoft.2012.03.002 CrossRefGoogle Scholar
- Cressie N (1990) The origins of kriging. Math Geol 22(3):239–252MathSciNetzbMATHGoogle Scholar
- Dey S, Mukhopadhyay T, Adhikari S (2015) Stochastic free vibration analysis of angle-ply composite plates-a RS-HDMR approach. Compos Struct 122:526–536Google Scholar
- Didcock N, Rainer A and Jakubek S (2014) Optimization and optimal control in automotive systems. Waschl H, Kolmanovsky I, Steinbuch M, et al. (eds). Lecture Notes in Control and Information Sciences. Cham: Springer International PublishingGoogle Scholar
- Dixon LCW (1978) The global optimization problem: An introduction. 2. Available at: https://www.researchgate.net/publication/247892177_The_global_optimization_problem_An_introduction (accessed 24 March 2016)
- Fang H, Rais-Rohani M, Liu Z et al (2005) A comparative study of metamodeling methods for multiobjective crashworthiness optimization. Comput Struct 83(25–26):2121–2136Google Scholar
- Fang K, Li R, Sudjianto A (2010) Design and modeling for computer experiments. CRC PressGoogle Scholar
- Forrester A, Sobester DA, Keane A (2008) Engineering design via surrogate modelling: a practical guide. John Wiley & SonsGoogle Scholar
- Forsberg J, Nilsson L (2004) On polynomial response surfaces and kriging for use in structural optimization of crashworthiness. Struct Multidiscip Optim 29(3):232–243Google Scholar
- Giunta AA and Watson LT (1998) A Comparison of Approximation Modeling Techniques: Polynomial Versus Interpolating Models, 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, AIAA 98–4758, Vol. 1, pp. 392–404. https://doi.org/10.2514/6.1998-4758
- Global Optimization Test Problems (n.d.) Available at: http://www.mat.univie.ac.at/~neum/glopt/test.html (accessed 24 March 2016)
- Gramacy RB and Lian H (2012) Gaussian process single-index models as emulators for computer experiments. Technometrics. Taylor & Francis Group 54(1):30–41Google Scholar
- Hagan MT and Demuth HB (1999) Neural networks for control. In: Proceedings of the 1999 American control conference (cat. No. 99CH36251), pp. 1642–1656. IEEEGoogle Scholar
- Hartmann B, Baumann W, Nelles O (2013) Axes-Oblique Partitioning of Local Model Networks for Engine Calibration. Design of Experiments (DoE) in Engine Development, Expert-Verlag GmbH: 92–106Google Scholar
- He Y, Rutland CJ (2004) Application of artificial neural networks in engine modelling. Int J Engine Res 5(4):281–296Google Scholar
- Howlett RJ, Zoysa MM, Walters SD, et al. (1999) Neural Network techniques for monitoring and control of internal combustion engines’. In: Int. Symposium on Intelligent Industrial Automation, Genova, Italy, 1999Google Scholar
- Huang Z, Qiu H, Zhao M, Cai X, Gao L (2015) An adaptive SVR-HDMR model for approximating high dimensional problems. Eng Comput 32(3):643–667Google Scholar
- Jin R, Chen W, Simpson TW (2001) Comparative studies of metamodelling techniques under multiple modelling criteria. Struct Multidiscip Optim 23(1):1–13Google Scholar
- Jin R, Du X, Chen W (2003) The use of metamodeling techniques for optimization under uncertainty. Struct Multidiscip Optim 25(2):99–116Google Scholar
- Kaymaz I (2005) Application of kriging method to structural reliability problems. Struct Saf 27(2):133–151Google Scholar
- Kersaudy P, Sudret B, Varsier N et al (2015) A new surrogate modeling technique combining kriging and polynomial chaos expansions – application to uncertainty analysis in computational dosimetry. J Comput Phys 286:103–117MathSciNetzbMATHGoogle Scholar
- Khan MAZ (2011) Transient engine model for calibration using two-stage regression approach. PhD Thesis, Loughborough University. Available at: https://dspace.lboro.ac.uk/dspace-jspui/handle/2134/8456 (accessed 18 December 2013)
- Kianifar MR, Campean F, Beattie T, et al. (2014) Analytical target cascading framework for diesel engine calibration optimisation. In: 13 October 2014Google Scholar
- Kianifar MR, Campean F, Wood A (2015) Application of permutation genetic algorithm for sequential model building–model validation design of experiments. Soft Comput:1–22Google Scholar
- Kim B-S, Lee Y-B, Choi D-H (2009) Comparison study on the accuracy of metamodeling technique for non-convex functions. J Mech Sci Technol 23(4):1175–1181Google Scholar
- Kleijnen JPC (2009) Kriging metamodeling in simulation: a review. Eur J Oper Res 192(3):707–716MathSciNetzbMATHGoogle Scholar
- Kleijnen JPC, van Beers WCM (2005) Robustness of Kriging when interpolating in random simulation with heterogeneous variances: some experiments. Eur J Oper Res 165(3):826–834MathSciNetzbMATHGoogle Scholar
- Kroetz HM, Tessari RK and Beck AT (2017) Performance of global metamodeling techniques in solution of structural reliability problems. Adv Eng Softw 114. Elsevier: 394–404Google Scholar
- Lach R, Weber F, Siebertz K, et al. (2007) DoE application within the base engine design. In: Design of Experiments in Engine Development (ed. K Roepke), 2007, pp. 249–260. Expert VerlagGoogle Scholar
- Li YF, Ng SH, Xie M, Goh TN (2010) A systematic comparison of metamodeling techniques for simulation optimization in decision support systems. Appl Soft Comput 10(4):1257–1273Google Scholar
- Liu H, Xu S and Wang X (2016) Sampling strategies and metamodeling techniques for engineering design: comparison and application. In: Volume 2C: Turbomachinery, 13 June 2016, p. V02CT45A019. ASMEGoogle Scholar
- Loeppky JL, Sacks J and Welch WJ (2009) Choosing the sample size of a computer experiment: a practical guide. Technometrics 51(4). Taylor & Francis: 366–376Google Scholar
- Marrel A, Iooss B, Van Dorpe F et al (2008) An efficient methodology for modeling complex computer codes with Gaussian processes. Compu Statist Data Anal 52(10):4731–4744MathSciNetzbMATHGoogle Scholar
- Matheron G (1976) Advanced geostatistics in the mining industry. Guarascio M, David M, and Huijbregts C (eds). Dordrecht: Springer NetherlandsGoogle Scholar
- Meckesheimer M, Barton RR, Simpson T et al (2001) Metamodeling of combined discrete/continuous responses. AIAA J 39(10):1950–1959Google Scholar
- Moon H, Dean AM and Santner TJ (2012) Two-stage sensitivity-based group screening in computer experiments. Technometrics. Taylor & Francis Group 54(4):376–387Google Scholar
- Morris MD, Mitchell TJ (1995) Exploratory designs for computational experiments. J Stat Plan Inference 43(3):381–402zbMATHGoogle Scholar
- Moyeed RA, Papritz A (2002) An Empirical Comparison of Kriging Methods for Nonlinear Spatial Point prediction. Mathematical Geology 34(4):365–386. https://doi.org/10.1023/A:1015085810154 MathSciNetGoogle Scholar
- Mullur AA, Messac A (2005) Metamodeling using extended radial basis functions: a comparative approach. Eng Comput 21(3):203–217Google Scholar
- Myers RH, Khuri AI and Carter WH (1989) Response surface methodology. Technometrics 31(2). Taylor & Francis: 137–157Google Scholar
- de Oliveira MA, Possamai O, Dalla Valentina LVO et al (2013) Modeling the leadership – project performance relation: radial basis function, Gaussian and Kriging methods as alternatives to linear regression. Expert Syst Appl 40(1):272–280Google Scholar
- Østergård T, Jensen RL and Maagaard SE (2018) A comparison of six metamodeling techniques applied to building performance simulations. Appl Energy 211. Elsevier: 89–103Google Scholar
- Paiva R, Crawford C, Suleman A, et al. (2009) A comparison of surrogate models in the framework of an MDO tool for wing design. In: 50th AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics, and materials conference, Reston, Virigina, 2009. American Institute of Aeronautics and AstronauticsGoogle Scholar
- Pan F, Zhu P, Zhang Y (2010) Metamodel-based lightweight design of B-pillar with TWB structure via support vector regression. Comput Struct 88(1–2):36–44Google Scholar
- Passos F, Gonzalez-Echevarria R, Roca E, et al. (2015) Surrogate modeling and optimization of inductor performances using Kriging functions. In: 2015 International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), 2015, pp. 1–4Google Scholar
- Picheny V, Ginsbourger D, Roustant O, et al. (2010) Adaptive Designs of Experiments for Accurate Approximation of a Target Region. J Mech Des 132(7):9. https://doi.org/10.1115/1.4001873
- Picheny V, Wagner T, Ginsbourger D (2013a) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidiscip Optim 48(3):607–626Google Scholar
- Picheny V, Wagner T, Ginsbourger D (2013b) A benchmark of kriging-based infill criteria for noisy optimization. Struct Multidiscip Optim 48(3):607–626Google Scholar
- Rango J, Schnorbus T, Kwee H et al (2013) Comparison of different approaches for global modeling of combustion engines. In: Design of Experiments (DoE) in engine development, vol 2013, pp 70–91Google Scholar
- Sacks J, Welch WJ, Mitchell TJ etal (1989) Design and analysis of computer experiments (with discussion), Statistical Science, 4:409–435zbMATHGoogle Scholar
- Sacks J, Welch WJ, Mitchell TJ et al (1989) Design and analysis of computer experiments. Statistical science. Institute of Mathematical Statistics 4(4):409–423MathSciNetzbMATHGoogle Scholar
- Schlober A, Linssen R and Bozelie P (2007) Model-based calibration of SI-engines: map optimization. In: Design of Experiments in Engine Development (ed. K Roepke), 2007, pp. 145–164. Expert VerlagGoogle Scholar
- Seabrook J, Salamon T, Edwards S et al. (2003) A comparison of neural networks, stochastic process methods and radial basis functions for the optimisation of engine control parameters. Tagung, Haus der Technik, Design of Experiments in engine development. BerlinGoogle Scholar
- Seabrook J, Collins J and Edwards S (2005) Application of advanced modelling techniques to the calibration of gasoline engines with direct injection and variable valve timing. In: Design of Experiments in Engine Development (ed. K Roepke), 2005, pp. 235–245. Expert VerlagGoogle Scholar
- Shan S and Wang GG (2010) Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions. Structural and multidisciplinary optimization 41(2). Springer-Verlag: 219–241Google Scholar
- Simpson TW, Korte JJ, Mauery TM, et al. (1998) Comparison of response surface and kriging models for multidisciplinary design optimization. In: 7th AIAA/USAF/ NASA/ISSMO symposium on multidisciplinary analysis & optimizationGoogle Scholar
- Simpson TW, Mauery TM, Korte JJ et al (2001a) Kriging models for global approximation in simulation-based multidisciplinary design optimization. AIAA J 39(12):2233–2241Google Scholar
- Simpson TW, Poplinski JD, Koch PN et al (2001b) Metamodels for computer-based engineering design: survey and recommendations. Eng Comput 17(2):129–150zbMATHGoogle Scholar
- Stein ML (2012) Interpolation of spatial data: some theory for kriging. Springer Science & Business Media, New YorkGoogle Scholar
- Toal J, DJ, Keane AJ (2013) Performance of an ensemble of ordinary, universal, non-stationary and limit Kriging predictors. Struct Multidiscip Optim 47(6):893–903Google Scholar
- Ulaganathan S, Couckuyt I, Ferranti F et al (2014) Performance study of multi-fidelity gradient enhanced kriging. Struct Multidiscip Optim 51(5):1017–1033Google Scholar
- Ulaganathan S, Couckuyt I, Dhaene T et al (2015) Performance study of gradient-enhanced Kriging. Eng Comput 32(1):15–34Google Scholar
- Van Gelder L, Das P, Janssen H, et al. (2014) Comparative study of metamodelling techniques in building energy simulation: guidelines for practitioners. Simul Mod Prac and Theory 49:245–257, https://doi.org/10.1016/j.simpat.2014.10.004 Google Scholar
- Varadarajan S, Chen W, Pelka CJ (2000) Robust concept exploration of propulsion systems with enhanced model approximation capabilities. Engineering Optimization 32(3). Taylor & Francis. Group:309–334Google Scholar
- Vicente-Serrano S, Saz-Sánchez M, Cuadrat J (2003) Comparative analysis of interpolation methods in the middle Ebro Valley (Spain): application to annual precipitation and temperature. Clim Res 24:161–180Google Scholar
- Volkova E, Iooss B, Van Dorpe F (2006) Global sensitivity analysis for a numerical model of radionuclide migration from the RRC “Kurchatov Institute” radwaste disposal site. Stoch Env Res Risk A 22(1):17–31MathSciNetzbMATHGoogle Scholar
- Wang G and Shan S (2005) Review of metamodeling techniques for product design with computation-intensive processes. In: Proceedings of the Canadian Engineering Education AssociationGoogle Scholar
- Wang H, Li E, Li GY (2009) The least square support vector regression coupled with parallel sampling scheme metamodeling technique and application in sheet forming optimization. Mater Des 30(5):1468–1479MathSciNetGoogle Scholar
- Wang H, Shan S, Wang GG, Li G (2011) Integrating least square support vector regression and mode pursuing sampling optimization for crashworthiness design. J Mech Des 133(4)Google Scholar
- Welch WJ, Buck RJ, Sacks J, et al. (1992a) Screening, predicting, and computer experiments. Technometrics 34(1). Taylor & Francis roupGoogle Scholar
- Welch WJ, Buck RJ, Sacks J, et al. (1992b) Screening, predicting, and computer experiments. Technometrics 34(1). Taylor & Francis Group: 15–25Google Scholar
- Xiong S, Qian PZG and Wu CFJ (2013) Sequential design and analysis of high-accuracy and low-accuracy computer codes. Technometrics. Taylor & Francis Group 55(1):37–46Google Scholar
- Yang RJ, Gu L, et al. (2000) Approximations for safety optimization of large systems. In: Proceeding of the 2000 ASME design engineering technical conferences and computers and information in engineering conference, Baltimore, Maryland, 2000, pp. 763–772Google Scholar
- Zhao D, Xue D (2010) A comparative study of metamodeling methods considering sample quality merits. Struct Multidiscip Optim 42(6):923–938Google Scholar
- Zhao Y, Rosala GF, Campean IF, et al. (2010) A response surface approach to front-car optimisation for minimising pedestrian head injury levels. International journal of crashworthiness. Taylor & Francis Group 15(2):143–150Google Scholar
- Zhao L, Choi KK, Lee I (2011) Metamodeling method using dynamic kriging for design optimization. AIAA J 49(9):2034–2046. https://doi.org/10.2514/1.J0551017 CrossRefGoogle Scholar
- Zhou Q, Qian PZG and Zhou S (2012) A simple approach to emulation for computer models with qualitative and quantitative factors. Technometrics. Taylor & FrancisGoogle Scholar
- Zhu P, Zhang Y, Chen G-L (2009) Metamodel-based lightweight design of an automotive front-body structure using robust optimization. Proc Inst Mech Eng, Part D: Journal of Automobile Engineering 223(9):1133–1147, https://doi.org/10.1243/09544070JAUTO1045 Google Scholar

## Copyright information

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.