1 Introduction

With exponentially increasing computing power, designers have today the possibility by simulation driven product development to create new innovative complex products in a short time. In addition, simulation based design also reduces the cost of product development by eliminating the need of creating several physical prototypes. Furthermore, a designer can create an optimized design with respect to multiple objectives with several constraints and design variables. However, the models and simulations, particularly those pertained in multidisciplinary design optimization (MDO), can be very complex and computationally expensive, see e.g. the multi-objective optimization of a disc brake in Amouzgar et al. (2013). Surrogate or metamodels have been accepted widely in the MDO community to deal with this issue. A metamodel is an explicit approximation function that predicts the response of a computational expensive simulation based model such as a non-linear finite element model. It also develops a relation between the input variables and their corresponding responses. In general, the aim of a metamodel is to create an approximation function of the original function over a given design domain. Many metamodeling methods have been developed for metamodel based design optimization problems. Some of the most recognized and studied metamodels are response surface methodology (RSM) or polynomial regression (Box and Wilson 1951), Kriging (Sacks et al. 1989), radial basis functions (Hardy 1971), support vector regression (SVR) (Vapnik et al. 1996) and artificial neural networks (Haykin 1998). Extensive surveys and reviews of different metamodeling methods and their applications are e.g. given by Simpson et al. (2001a, 2008), Wang and Shan (2007) and Forrester and Keane (2009).

Several comparative studies, investigating the accuracy and effectiveness of various surrogate models, can be found in the literature. However, one cannot find an agreement on the dominance of one specific method over others. In an early study, Simpson et al. (1998) compared second-order response surfaces with Kriging. The metamodels were applied on a multidisciplinary design problem and four optimization problems. Jin et al. (2001) conducted a systematic comparison study of four different metamodeling techniques: polynomial regression, Kriging, multivariate adaptive regression splines and radial basis function. They used 13 mathematical test functions and an engineering test problem considering various characteristics of the sample data and evaluation criteria. They concluded that in overall RBF performed the best for both large and small scale problems with high-order of non-linearity. Fang et al. (2005) studied RSM and RBF to find the best method for modeling highly non-linear responses found in impact related problems. They also compared the RSM and RBF models with a highly non-linear test function. Despite the computation cost of RBF, they concluded dominance of RBF over RSM in such optimization problems. Mullur and Messac (2006) compared extended radial basis function (E-RBF), with three other approaches; RSM, RBF and Kriging. A number of modelling criteria including problem dimension, sampling technique, sample size and performance criteria were employed. The E-RBF was identified as the superior method since parameter setting was avoided and the method resulted in an accurate metamodel without a significant increase in computation time. Kim et al. (2009) performed a comparative study of four metamodeling techniques using six mathematical functions and evaluated the results by root mean squared error. Kriging and moving least squares showed promising results in that study. In another study by Zhao and Xue (2010), four metamodeling methods are compared by considering three characteristics of quality of the sample (sample size, uniformity and noise) and four performance measures (accuracy, confidence, robustness and efficiency). Backlund et al. (2012) studied the accuracy of RBF, Kriging and support vector regression (SVR) with respect to their capability in approximating base functions with large number of variables and variant modality. The conclusion was that Kriging appeared to be the dominant method in its ability to approximate accurately with fewer or equivalent number of training points. Also, unlike RBF and SVR, the parameter tuning in Kriging was automatically done during training process. RBF was found to be the slowest in building the model with large number of training points. In contrast, SVR was the fastest in large scale multi-modal problems.

In most of the previously conducted comparison studies, RBF has shown to perform well in different test problem and engineering applications. Therefore, in this paper, we don’t recognize a need to compare RBF with other metamodeling techniques again. Instead we focus on a detailed comprehensive comparison of our proposed RBF with a priori bias with the classical augmented RBF (RBF with a posteriori bias). The factors that are present during the construction of a metamodel (modeling criteria), range from the dimension of the problem, the type of radial basis functions used in RBF, the sampling technique and sample size. The evaluation of the modeling criteria and their affect on the accuracy, performance and robustness of a metamodel will help the designer to chose an appropriate metamodeling technique for their specific application. A recent comparison study of these two approaches have been conducted by the authors Amouzgar and Strömberg (2014). The preliminary results revealed the potential of RBF with a priori bias in predicting the test problem values. This potential is evaluated in detail in this paper for nine established mathematical test functions. A pre-study on the performance of our RBF with a priori bias in metamodel based design optimization is also performed for two benchmarks. The results clearly demonstrate that our RBF with a priori bias is most attractive for the choice of surrogate model in MDO.

2 Radial Basis Functions Networks

Radial basis functions were first used by Hardy (1971) for multivariate data interpolation. He proposed RBFs as approximation functions by solving multi-quadratic equations of topography based on coordinate data with interpolation. A radial basis function network of ingoing variables x i collected in x can be written as

$$ f(\boldsymbol{x}) = \sum\limits_{i=1}^{N_{\Phi}} {\Phi}_{i}(\boldsymbol{x}) \alpha_{i} + b(\boldsymbol{x}), $$
(1)

where f = f(x) is the outgoing response of the network, Φ i i (x) represents the radial basis functions, N Φ is the number of radial basis functions, α i are weights and b = b(x) is a bias. The network is depicted in Fig. 1.

Fig. 1
figure 1

a A radial basis functions network; b the Gauss function

Examples of popular radial basis functions are

$$\begin{array}{@{}rcl@{}} \text{Linear:}\quad {\Phi}_{i}(r)&=&r,\\ \text {Cubic:}\quad {\Phi}_{i}(r)&=&r^{3},\\ \text {Gaussian:}\quad {\Phi}_{i}(r)&=&e^{-\theta_{i} r^{2}},\; 0\leq \theta_{i} \leq1, \\ \text {Quadratic:}\quad {\Phi}_{i}(r)&=&\sqrt{r^{2}+{\theta_{i}^{2}}},\; 0\leq \theta_{i} \leq1, \end{array} $$
(2)

where 𝜃 i represents the shape parameters and

$$ r(\boldsymbol{x})=\sqrt{(\boldsymbol{x}-\boldsymbol{c}_{i})^{T}(\boldsymbol{x}-\boldsymbol{c}_{i})} $$
(3)

is the radial distance. The shape parameters controls the width of the radial basis functions. A radial basis function with a small value of 𝜃 i gives a narrower effect on the surrounding region. In other words, the nearby points of an unknown point will affect the prediction of the response on that point. In this case the risk of overfitting will occur, which means the sample points will influence only on a very close neighbourhood. An overfitted response surface does not capture the true function accurately, it rather describes the noise, even in noise free data sets. c i is the center point for each radial basis function. The number of center points is commonly set equal to the number of sample points. We have found that using the sample points as the center points will usually result to a more accurate model.

In this work we consider the bias to be a polynomial function, which is considered to be known either a priori or a posteriori. The bias is formulated as

$$ b=\sum\limits_{i=1}^{N_{\beta}} \xi_{i}(\boldsymbol{x}) \beta_{i}, $$
(4)

where ξ i (x) represents the polynomial basis functions and β i are constants. N β is the number of terms in the polynomial function.

Thus, for a particular signal \(\hat {\boldsymbol {x}}_{k}\) the outcome of the network can be written as

$$ f_{k}=f(\hat{\boldsymbol{x}}_{k}) = \sum\limits_{i=1}^{N_{\Phi}} A_{ki} \alpha_{i} +\sum\limits_{i=1}^{N_{\beta}} B_{ki}\beta_{i}, $$
(5)

where

$$ \begin{array}{ccc} A_{ki}={\Phi}_{i}(\hat{\boldsymbol{x}}_{k}) & \text{and} & B_{ki}=\xi_{i}(\hat{\boldsymbol{x}}_{k}). \end{array} $$
(6)

Furthermore, for a set of signals, the corresponding outgoing responses f = {f i } of the network can be formulated compactly as

$$ \boldsymbol{f}= \boldsymbol{A}\boldsymbol{\alpha} + \boldsymbol{B}\boldsymbol{\beta}, $$
(7)

where α = {α i }, β = {β i }, A = [A i j ] and B = [B i j ].

2.1 Bias known a priori

We suggest to set up the RBF in (1) by treating the bias as known a priori. This is presented here. The established approach by letting the bias be unknown is presented next.

The network in (1) is trained in order to fit a set of known data \(\{\hat {\boldsymbol {x}}_{k},\hat {f}_{k}\}\). We assume that the number of data is N d and we collect all \(\hat {f}_{k}\) in \(\hat {\boldsymbol {f}}\). The training is performed by minimizing the error

$$ \boldsymbol{\epsilon} = \boldsymbol{f} -\hat{\boldsymbol{f}} $$
(8)

in the least squares sense. We begin by considering this problem when the constants \(\boldsymbol {\beta }=\hat {\boldsymbol { \beta }}\) are known a priori. The minimization problem then reads

$$ \min_{\boldsymbol{\alpha}}\, \frac{1}{2} \left( \boldsymbol{A}\boldsymbol{\alpha}-(\hat{\boldsymbol{f}} -\boldsymbol{B}\hat{\boldsymbol{\beta}})\right)^{T}\left( \boldsymbol{A}\boldsymbol{\alpha}-(\hat{\boldsymbol{f}} -\boldsymbol{ B}\hat{\boldsymbol{\beta}})\right). $$
(9)

The solution to this problem is given by

$$ \hat{\boldsymbol{\alpha}}= \left( \boldsymbol{A}^{T}\boldsymbol{A} \right)^{-1} \boldsymbol{A}^{T}\left( \hat{\boldsymbol{f}}-\boldsymbol{B}\hat{\boldsymbol{\beta}}\right). $$
(10)

An obvious possibility to define \(\hat {\boldsymbol {\beta }}\) a priori, which is used in this work, is to use the following optimal regression coefficients:

$$ \hat{\boldsymbol{\beta}}=\left( \boldsymbol{B}^{T}\boldsymbol{B}\right)^{-1}\boldsymbol{B}^{T}\hat{\boldsymbol{f}}. $$
(11)

2.2 Bias known a posteriori

If the bias is considered not to be known a priori, then (9) modifies to

$$ \min_{(\boldsymbol{\alpha},\boldsymbol{\beta})} \, \frac{1}{2} \left( \boldsymbol{A}\boldsymbol{\alpha}+\boldsymbol{B}\boldsymbol{ \beta}-\hat{\boldsymbol{f}}\right)^{T}\left( \boldsymbol{A}\boldsymbol{\alpha}+\boldsymbol{B} \boldsymbol{\beta}-\hat{\boldsymbol{ f}}\right). $$
(12)

Furthermore, if we also assume that N Φ + N β > N d , then the following orthogonality constraint is introduced:

$$ \sum\limits_{i=1}^{N_{\Phi}} \xi_{j}(\boldsymbol{c}_{i})\alpha_{i} =0, j=1,\ldots,N_{\beta}. $$
(13)

This can be written on matrix format as

$$ \boldsymbol{R}^{T}\boldsymbol{\alpha} = \boldsymbol{0}, $$
(14)

where

$$ \begin{array}{cc} \boldsymbol{R}=[R_{ij}], & R_{ij} = \xi_{j}(\boldsymbol{c}_{i}). \end{array} $$
(15)

In conclusion, for a bias known a posteriori, we have to solve the following problem:

$$ \left\{ \begin{array}{l} \min\limits_{(\boldsymbol{\alpha},\boldsymbol{\beta})} \, \frac{1}{2} \left( \boldsymbol{A}\boldsymbol{\alpha}+\boldsymbol{B}\boldsymbol{\beta}- \hat{\boldsymbol{f}}\right)^{T}\left( \boldsymbol{A}\boldsymbol{\alpha}+\boldsymbol{B}\boldsymbol{\beta}-\hat{\boldsymbol{ f}}\right) \\ \text{s.t. } \boldsymbol{R}^{T} \boldsymbol{\alpha} =\boldsymbol{0}. \end{array}\right. $$
(16)

The corresponding Lagrangian function is given by

$$ \mathcal{L}(\boldsymbol{\alpha},\boldsymbol{\beta},\boldsymbol{\lambda}) = \frac{1}{2} \left( \boldsymbol{A}\boldsymbol{ \alpha}+\boldsymbol{B}\boldsymbol{\beta}-\hat{\boldsymbol{f}}\right)^{T}\left( \boldsymbol{A}\boldsymbol{\alpha}+\boldsymbol{ B}\boldsymbol{\beta}-\hat{\boldsymbol{f}}\right) + \boldsymbol{\lambda}^{T} \boldsymbol{R}^{T} \boldsymbol{\alpha}. $$
(17)

The necessary optimality conditions become

$$\begin{array}{@{}rcl@{}} \frac{\partial \mathcal{L}}{\partial \boldsymbol{\alpha}}&=& \boldsymbol{A}^{T} (\boldsymbol{A}\boldsymbol{\alpha}+\boldsymbol{ B}\boldsymbol{\beta}-\hat{\boldsymbol{f}})+\boldsymbol{R}\boldsymbol{\lambda} =\boldsymbol{0}, \\ \frac{\partial \mathcal{L}}{\partial \boldsymbol{\beta}} &= &\boldsymbol{B}^{T}(\boldsymbol{A}\boldsymbol{\alpha}+\boldsymbol{ B}\boldsymbol{\beta}-\hat{\boldsymbol{f}})=\boldsymbol{ 0},\\ \frac{\partial \mathcal{L}}{\partial \boldsymbol{\lambda}}& =& \boldsymbol{R}^{T}\boldsymbol{\alpha}=\boldsymbol{0}. \end{array} $$
(18)

The optimality conditions in (18) can also be written on matrix format as

$$ \left[ \begin{array}{ccc} \boldsymbol{A}^{T}\boldsymbol{A} \; & \boldsymbol{A}^{T}\boldsymbol{B} \; & \boldsymbol{R} \\ \boldsymbol{B}^{T}\boldsymbol{A} \; & \boldsymbol{B}^{T}\boldsymbol{B} \; & \boldsymbol{0} \\ \boldsymbol{R}^{T} \; & \boldsymbol{0} \; & \boldsymbol{0} \end{array}\right] \left\{ \begin{array}{c} \boldsymbol{\alpha} \\ \boldsymbol{\beta} \\ \boldsymbol{\lambda} \end{array}\right\} = \left\{ \begin{array}{c} \boldsymbol{A}^{T}\hat{\boldsymbol{f} }\\ \boldsymbol{B}^{T}\hat{\boldsymbol{f}}\\ \boldsymbol{0} \end{array} \right\}. $$
(19)

By solving this system of equations, the radial basis function network with a bias known a posteriori is established.

If the center points c i are chosen to be equal \(\hat {\boldsymbol {x}}_{i}\), then R = B, the network becomes an interpolation and (19) can be reduced to

$$ \left[ \begin{array}{ccc} \boldsymbol{A} \;& \boldsymbol{B} \\ \boldsymbol{B}^{T} \; & \boldsymbol{0} \end{array}\right] \left\{ \begin{array}{c} \boldsymbol{\alpha} \\ \boldsymbol{\beta} \end{array}\right\} = \left\{ \begin{array}{c} \hat{\boldsymbol{f}}\\ \boldsymbol{0} \end{array} \right\}. $$
(20)

This is the established approach in setting the RBF in (1). We suggest one can simply use (10) and (11), which are nothing more than two normal equations; (10) is the normal equation to (9) and (11) is the normal equation to the corresponding regression problem. Obviously, the two approaches will produce different RBFs. This is demonstrated in Fig. 2, where the biases are compared using both approaches for the same benchmark problem. In the following the performance of theses two approaches are studied in detail. Further on in the present paper, RBF with the bias known a posteriori is briefly called a posteriori RBF and abbreviated by R B F p o s , and radial basis functions with bias known a priori is called a priori RBF and abbreviated by R B F p r i .

Fig. 2
figure 2

The bias plotted for test function 4 using both approaches: (a) RBF with a priori known bias, (b) RBF with a posteriori known bias

3 Test functions

The comparison of the two RBF approaches are based on 9 different mathematical test functions presented below. These test functions are commonly used as benchmarks for unconstrained global optimization problems.

  1. 1.

    Branin-Hoo function (Branin 1972)

    $$ {f}_{1}=2(x_{2}-\frac{5.1{x_{1}^{2}}}{4\pi^{2}}+\frac{5x_{1}}{\pi}-6)+10(1-\frac{1}{8\pi})\cos(x_{1})+10. $$
    (21)
  2. 2.

    Goldstein-Price function (Goldstein and Price 1971)

    $$ \begin{array}{ll} f_{2}=&\left( 1+(x_{1}+x_{2}+1)^{2}\left( 19-14x_{1}+3{x_{1}^{2}}-14x_{2}\right.\right.\\ &\left.\left.+6x_{1}x_{2}+3{x_{2}^{2}}\right)\right)\\ &\times\left( 30+(2x_{1}-3x_{2})^{2}\left( 18-32x_{1}+12{x_{1}^{2}}+48x_{2}\right.\right.\\ &\left.\left.-36x_{1}x_{2}+27{x_{2}^{2}}\right)\right) . \end{array} $$
    (22)
  3. 3.

    Rastrigin function

    $$\begin{array}{@{}rcl@{}} {f}_{3} = 20 + \sum\limits_{i=1}^{N}{{x_{i}^{2}} - 10\cos (2\pi x_{i})}. \end{array} $$
    (23)

    In this study, the Rastrigin function with 2 variables is used (N=2).

  4. 4.

    Three-Hump Camel function

    $$ {f}_{4}=2{x_{1}^{2}}-1.05{x_{1}^{4}}+\frac{{x_{1}^{6}}}{6}+x_{1}x_{2}+{x_{2}^{2}}. $$
    (24)
  5. 5.

    Colville function

    $$ \begin{array}{ll} {f}_{5}=& 100({x_{1}^{2}}-x_{2})^{2}+(x_{1}-1)^{2}+(x_{3}-1)^{2}+90({x_{3}^{2}}-x_{4})^{2}\\ &+10.1((x_{2}-1)^{2}+(x_{4}-1)^{2})+19.8(x_{2}-1)(x_{4}-1). \end{array} $$
    (25)
  6. 6.

    Math 1

    $$ \begin{array}{ll} {f}_{6}=& (x_{1}-10)^{2}+5(x_{2}-12)^{2}+{x_{3}^{4}}+3(x_{4}-11)^{2}\\ &+10{x_{5}^{6}}+7{x_{6}^{2}}+{x_{7}^{4}}-4x_{6}x_{7}-10x_{6}-8x_{7}. \end{array} $$
    (26)
  7. 7.

    Rosenbrock-10 function (Rosenbrock 1960)

    $$ {f}_{7}=\sum\limits_{n=1}^{N-1}\left( 100(x_{n+1}-{x_{n}^{2}})^{2}+(x_{n}-1)^{2}\right). $$
    (27)

    In this study, the Rosenbrock function with 10 variables is used (N=10).

  8. 8.

    Math 2 (A 10-variable mathematical function)

    $$ {f}_{8}=\sum\limits_{m=1}^{10}\left( \frac{3}{10}+\sin(\frac{16}{15}x_{m}-1)+\sin(\frac{16}{15}x_{m}-1)^{2}\right). $$
    (28)
  9. 9.

    Math 3 (A 16-variable mathematical function) (Jin et al. 2001)

    $$ {f}_{9}=\sum\limits_{m=1}^{16}\sum\limits_{n=1}^{16}a_{mn}({x_{m}^{2}}+x_{m}+1)({x_{n}^{2}}+x_{n}+1), $$
    (29)

    where a is defined in Jin et al. (2001).

The properties of the test functions are summarized in Table 1.

Table 1 Mathematical test functions

4 Modelling and performance criteria for comparison

Standard statistical error analysis is used to evaluate the accuracy of the the two RBF approaches. Details of this analysis are presented in this section.

4.1 Performance metrics

The two standard performance metrics are applied to the off-design test points: (i) Root Mean Squared Error (RMSE) and (ii) Maximum Absolute Error (MAE). The lower the RMSE and MAE values, the more accurate the metamodel will be. The aim is to have these two error measures as near to zero as possible.

The RMSE is calculated by

$$\begin{array}{@{}rcl@{}} \text{RMSE} &=& \sqrt{\frac{{\sum}_{i=1}^{n}\left( \hat{f}_{i}-{f}_{i}\right)^{2}}{n}} \end{array} $$
(30)

and MAE is defined by

$$\begin{array}{@{}rcl@{}} \text{MAE}&=& max \vert \hat{f}_{i}-{f_{i}}\vert, \end{array} $$
(31)

where n is the number of off-design test points selected to evaluate the model, \(\hat {f}_{i}\) is the exact function value at the i t h test point and f i represents the corresponding predicted function value.

RMSE and MAE are typically at the same order of the actual function values. These error measures will not indicate the relative performance quality of the RBFs across different functions independently. Therefore, to compare the performance measures of the two approaches over test functions, the normalized values of the two errors, NRMSE and NMAE, by using the actual function values, are calculated by

$$\begin{array}{@{}rcl@{}} \text{NRMSE} &=& \sqrt{\frac{{\sum}_{i=1}^{n}\left( \hat{f}_{i}-{f}_{i}\right)^{2}}{{\sum}_{i=1}^{n}\left( \hat{f}_{i}\right)^{2}}}, \end{array} $$
(32)
$$\begin{array}{@{}rcl@{}} \text{NMAE}&=& \frac{max \vert \hat{f}_{i}-{f_{i}}\vert}{\sqrt{\frac{1}{n}{\sum}_{i=1}^{n}\left( \hat{f}_{i}-\bar{f}_{i}\right)^{2}}}, \end{array} $$
(33)

where \(\bar {f}\) denotes the mean of the actual function values at the test points.

In addition, the NRMSE and NMAE of a priori RBF is compared to the a posteriori RBF approach by defining the corresponding relative differences. The relative difference in NRMSE (D NRMSE) of a posteriori RBF is given by

$$\begin{array}{@{}rcl@{}} {{D}_{RBF_{pos}}^{NRMSE}}&=& \frac{NRMSE_{RBF_{pos}}-NRMSE_{RBF_{pri}}}{NRMSE_{RBF_{pri}}} \times 100\%, \end{array} $$
(34)

and the relative difference in NMAE (D NMAE) of a posteriori RBF is defined by

$$\begin{array}{@{}rcl@{}} {{D}_{RBF_{pos}}^{NMAE}}&=& \frac{NMAE_{RBF_{pos}}-NMAE_{RBF_{pri}}}{NMAE_{RBF_{pri}}} \times 100\%, \end{array} $$
(35)

where NRMSE and NMAE values of the R B F p o s approach are referred by \( NRMSE_{RBF_{pos}} \) and \( NMAE_{RBF_{pos}} ;\) and \( NRMSE_{RBF_{pri}} \) and \( NMAE_{RBF_{pri}} \) are the corresponding NRMSE and NMAE values of the R B F p r i approach.

4.2 Radial basis functions

Several different radial basis functions can be used in constructing the RBF, as mentioned in Section 2. Each will yield to a different result depending to the nature of the problem. However, in real world applications, the mathematical properties of the problem is usually not known in advance. Thus, a designer needs a robust choice of radial basis function which is as independent as possible to the nature of the problem and will result to a acceptably accurate metamodel. In this paper, four different radial basis functions: (i) linear, (ii) cubic, (iii) Gaussian, and (iv) quadratic, formulated in (2), are used to study the effect of radial basis functions on the accuracy of metamodels.

4.3 Sampling techniques

Sampling techniques are used to create DoEs for which the particular RBF then is fitted to. A robust sampling technique is desired for a designer to avoid dependencies to sampling techniques, as much as possible, for different problems. In other words, one would like to have a metamodeling technique that is as independent as possible to the sampling technique. In this study, three different sampling techniques are chosen: (i) Random sampling (RND), (ii) Latin hypercube sampling (LHS) and (iii) Hammersley sequence sampling (HSS) and their effects on the accuracy of the two approaches are investigated. For the optimization problems studied at the end, we also compare these sampling techniques to a successive screening approach for generating appropriate DoEs.

In random sampling, a desired set of uniformly distributed random numbers within the variable bounds of each test function is chosen. As expected there is no uniformity in the created set of DoE. The Latin hypercube sampling technique creates samples that are relatively uniform in each single dimension while subsequent dimensions are randomly paired to fit a m-dimensional cube . LHS can be regarded as a constrained Monte Carlo sampling scheme developed by McKay et al. (1979) specifically for computer experiments. Hammersley sequence sampling produces more uniform samples over the m-dimensional space than LHS. This can be seen in Fig. 3 which illustrates the uniformity of a set of 15 sample points over a unit square, using RND, LHS and HSS. Hammersley sequence sampling uses a low discrepancy sequence (Hammersley sequence) to uniformly place N points in a m-dimensional hypercube given by the following sequence:

$$ {Z}_{m}(n)=\left( \frac{n}{N}, \phi_{R_{1}}(n), \phi_{R_{2}}(n),...,\phi_{R_{m-1}}(n)\right),\qquad n=1,2,...,N, $$
(36)

where R 1, R 2, ...,R m−1 are the first m−1 prime numbers. ϕ R (n) are constructed by reversing the order of the digits of any integer, written in radix-R notation, around the decimal points. In this work, HSS is coded in Matlab based on the theory in the original paper by Kalagnanam and Diwekar (1997), where detailed definition and theory of Hammersley points can be found.

Fig. 3
figure 3

Uniformity of different sampling techniques: (a) RND, (b) LHS, (c) HSS

4.4 Size of samples

The DoE size (size of samples) has an important effect on choosing an accurate surrogate model. In general, increasing the size of DoE will improve the quality of metamodels when using the RBF approach, however over fitting is an critical issue in these approaches. Three different sample size are used in this paper: (i) Low, (ii) Medium and (iii) High. The number of samples for each sampling group is proportional to a reference value for low and high dimension problems. The number of coefficients k = (m + 1)(m + 2)/2 in a second order polynomial with m number of variables is used as a reference. For all the test functions the number of DoE is chosen as a coefficient of k. The sample size for low dimension test function are: (i) 1.5k for low sample size, (ii) 3.5k for medium sample size equals, and (iii) 6k for high sample size. High dimensional test functions have the size of DoE defined as: i) 1.5k for low sample size, (ii) 2.5k for medium sample size, and (iii) 5k for high sample size.

4.5 Test functions dimensionality

Dimension of a test function or the number of the variables in a problem, is one of the most important properties in generating an accurate surrogate model. In order to investigate the effect of this modelling criteria on the two approaches we divided the test functions into two categories: (i) Low, where the number of variables are less than or equal to 4, and (ii) High, for test functions with the number of variables of more than 4. Labelling the second group by “high”, implies the relative meaning of higher number of variables compared to the first group. Otherwise, high dimensional engineering problems generally consist of considerable higher number of variables. The results are grouped separately for low and high dimension test functions in all modeling criteria. Our goal is that a final conclusion can be drawn by studying the results.

5 Comparison procedure

In this section, we describe the procedure used to compare the two metamodeling approaches (R B F p r i , R B F p o s ) under multiple modeling criteria mentioned in previous sections. The comparison is based on the 9 mathematical test functions and the performance metrics described in previous sections. We summarize the comparison procedure into 6 steps as follow:

  • Step 1: The number of DoEs is determined based on the three sample size groups (low, medium and high) in Table 2, for each test function.

  • Step 2: The design domains are mapped linearly between 0 and 1 (unit hypercube). The surrogate models are fitted on the mapped variables by using the two approaches. For calculating the performance metrics the metamodel is mapped back to the original space.

  • Step 3: To avoid any probable sensitivity of metamodels to a specific DoE, 50 distinctive sample sets are generated for each sample size of step 1 by using RND and LHS described in the previous section. The sensitivity of surrogate models to a specific DoE is avoided to a great extent. Since HSS technique is deterministic, only one sample set is generated by using this method for each sample size. The Latin Hypercube sampling techniques (LHS) is performed by using the Matlab function ”lhsdesign”. The Latin hypercube samples are created with 20 iterations to maximize the minimum distance between points. 50 different sets of sample points are created for each sample size by using the LHS and the random (RND) sampling technique. The Hammersely (HSS) samples, are created from Hammersley quasirandom sequence using successive primes as bases by using an in-house Matlab code.

  • Step 4: Metamodels are constructed using the two RBF approaches (R B F p r i and R B F p o s ) with each of the four different radial basis functions (linear, cubic, Guassian and quadratic) to be compared for each set of DoE generated by the three sampling techniques. Therefore, for each test function 2 (RBF approaches) × 4 (radial basis functions) × 3 (sampling techniques) × 3 (sample sizes) × 50 (set of DoE) = 3600 surrogate models are constructed.

  • Step 5: 1000 test points are randomly selected within the design space. The exact function value \(\hat {f}_{i}\) and the predicted function value f i at each test point is calculated. RMSE, MAE, and the corresponding normalized values are computed by using (30) to (33). The average of the normalized errors is calculated across the 50 sample sets. The average of the normalized root mean squared and maximum absolute error are simply shown by N R M S E and NMAE in this paper. Finally, the relative difference measures of the computed average errors, N R M S E and NMAE for R B F p o s are calculated by using (34) and (35).

  • Step 6: The procedure from step 1 to 5 is repeated for all test problems. In addition to the mean normalized errors (N R M S E and NMAE), the average of low dimension problems (the first five test functions) denoted by “Ave. Low”, the average of high dimension problems (test functions 6 to 9) expressed by “Ave. High” and the average error metrics of all 9 test functions shown by “Ave. All” are computed for the surrogate approaches using different sampling techniques.

Table 2 Modeling criteria of test functions

It should be noted that, because the variables are mapped to a unit cube (in step 2), the parameter setting can be done without considering the magnitude of the design variables. Thus, the parameter 𝜃 used in the radial basis functions in (2) is set to one (𝜃 = 1). The bias chosen for this study, in (4), is a quadratic polynomial with 6 terms.

6 Results and discussion

In this section, the results gathered from the metamodels constructed according to the comparison procedure in previous section are presented. The effect of each modeling criteria is discussed by comparing the two main error measures, NRMSE and NMAE, for the two RBF approaches by presenting them in several tables and charts. Including all modeling criteria in the comparison study of each criteria for all test functions requires an extensive and very detailed results section which should incorporate all 3600 surrogate models. This is out of scope of this work and can be the topic of future studies. Therefore, for studying the effect of each modeling criteria, a specific selection of other criteria is chosen. They are mentioned in the forthcoming sections.

Before presenting the results, it is worth mentioning that the computational cost of the proposed R B F p r i is less than R B F p o s . This has been investigated by calculating the training time of the two approaches for test functions 3 and 8 with 100 variables and 15453 sampling points by using the cubic radial basis function and HSS sampling method. The computational times related to f 3 are 346.67 and 396.97 seconds for R B F p r i and R B F p o s , respectively. Test function 8 is trained in 350.48 and 591.76 seconds by using R B F p r i and R B F p o s , respectively.

6.1 Effect of basis functions

Table 3 shows the NRMSE and NMAE values of high sample size and LHS sampling technique of R B F p r i and R B F p o s by using the four different basis functions across all test problems. The bold faced values highlight the lowest errors for each test function. It can be seen that the minimum errors are varied through different basis functions between the test functions. However, cubic basis function results in lower values in both NRMSE and NMAE for f 1, f 8 and f 9. Also by studying and comparing the results obtained from all 3600 constructed metamodels one can conclude that the cubic basis function is the preferred choice, because of it’s robust behaviour under different criteria, when there is no prior knowledge on the mathematical property of the problem. This may be because of lack of any extra parameter in the cubic radial basis function. Thus, parameter setting and finding the optimal shape parameter is not required in cubic radial basis function.

Table 3 NRMSE and NMAE (LHS sampling with high sample size)

It is cumbersome to compare the two approaches under each modeling criteria by using all the radial basis functions. Therefore, for each test function and modeling criteria a radial basis function is chosen, and the two metamodels are constructed by using that radial basis function. Table 4 summarizes the chosen radial basis functions for each test function and modeling criteria.

Table 4 Summary of chosen basis functions

In cases where the best performed basis function is different for the two approaches under a modeling criteria, the basis function which performed better by using the R B F p o s is selected. This will enable a more reliable comparison between the two approaches.

6.2 Effect of sampling technique

The error measures of surrogate models constructed by the two approaches using the three sampling techniques are shown in Table 5. The values are extracted based on the basis functions chosen according to Table 4. Figure 4 depicts a summary of the results in Table 5 by comparing the performance metrics of “Ave. Low”, ”Ave. High” and ”Ave. All” rows. By observing the NRMSE values in Fig. 4a, the lowest errors for both approaches correspond to the HSS technique followed by the LHS and the random sampling technique which has the highest values for NRMSE. The only exceptions where the LHS generate a better metamodel, are in test function 3 (f 3) and the last high dimensional test function (f 9). Considering the NMAE values in Fig. 4b, the HSS method yields to the lowest errors. Also, the low dimension problems perform better with random sampling technique in comparison to LHS technique. Both R B F p r i and R B F p o s perform better when LHS sampling technique is used in high dimension problems (Fig. 4), however this gain is marginal compared to the two other techniques.

Table 5 NRMSE and NMAE of each sampling technique (high sample size)
Fig. 4
figure 4

Comparison of different sampling techniques: (a) normalized root mean squared error (RMSE); (b) normalized maximum absolute error (NMAE)

The “ave. all” bars in Fig. 4a and b along with the data in Table 5, show 4.7 % and 6.8 % improve in NRMSE and NMAE when using HSS technique instead of LHS in R B F p r i approach. While these values are 11.1 % and 14.3 % for the R B F p o s approach. The advantage of R B F p r i over R B F p o s , as being more robust in terms of NRMSE and NMAE with regards to the change of sampling technique can be seen in the aforementioned percentages.

6.3 Effect of sampling size

Figure 5a depicts the NRMSE values of “Ave. Low”, “Ave. High” and “Ave. all” for the three different sample sizes by using the LHS sampling technique, while Fig. 5b shows the similar NMAE values. Both metamodeling approaches will improve in quality with increasing sample size, regardless of the problem’s dimension. Table 6 shows the relative differences (in percentage) of NMRSE and NMAE comparing the R B F p r i and R B F p o s with regards to the different sample size. The R B F p o s approach performs better than R B F p r i with low sample size considering both error metrics, which the negative percentage values in Table 6 reveals the exact degree of superiority. However, by increasing the sample size to medium and high the performance changes and R B F p r i appears to be the dominant approach. This advantage is noticeable in the NMAE values, in contrast to the marginal improvement of NRMSE values when using R B F p r i . Specially in high dimensional problems with medium sample size the relative difference is less than one percent (0.74 %). By looking at the “Ave. All” row of Table 6 we can observe a 13.2 % improvement in NMAE value by using R B F p r i with high sample size and 4 % better accuracy in NRMSE value.

Fig. 5
figure 5

Comparison of different sample size using LHS technique: (a) normalized root mean squared error (RMSE); (b) normalized maximum absolute error (NMAE)

Table 6 Relative differences of NRMSE and NMAE comparing R B F p r i and R B F p o s considering sampling size

6.4 Effect of dimension

The effect of test function’s dimension on metamodel performance can be studied by summarizing the average NRMSE and NMAE values of low and high dimension problems in Table 7. The values are obtained by averaging the “Ave. Low” and ”Ave. High” rows over all sampling techniques in Table 5. Considering the NRMSE, the R B F p r i approach performs better with low dimensional problems, while the R B F p o s approach generates better performance metrics in high dimensional problems. Although the advantage of R B F p o s in high dimensional problem compared to low dimensional is only around 1 % for NRMSE, it will increase to approximately 10 % for the NMAE metric. The R B F p r i has a superiority in performance of around 9.5 % in problems with low dimension in comparison to high dimensional problems considering NMAE. In addition, the last two rows in Table 7 compares the performance of R B F p r i with R B F p o s for low and high dimension problems separately. The results confirm the advantage of R B F p r i over R B F p o s in low dimensional problems with the superiority degree of 4.6 % and 17.2 % with regards to NRMSE and NMAE respectively. On the other hand, the better performance of R B F p o s compared to R B F p r i in high dimensional test functions is marginal and are around 2 % for both NRMSE and NMAE.

Table 7 NRMSE, NMAE and their related relative differences values averaged over all sampling techniques

6.5 Overall accuracy

For the test functions with two input variables (the first four test functions) the three-dimensional surface plots are shown in Figs. 69, respectively. The plots depict the actual function and the corresponding metamodels constructed by using R B F p r i and R B F p o s . The metamodel surfaces in Figs. 67 8 and 9 are generated by using the same set of DoE, created with HSS technique and high sample size, for each test function. The overall accuracy comparison of R B F p r i and R B F p o s can be studied by observing the surface plot figures and using Table 8. The table presents the relative difference in NRMSE and NMAE (as percentage) and the average values of low and high dimensional of all test functions. The values are extracted for each metamodeling approach by using the basis function mentioned in Table 4, the three sampling technique with high sample size. With regards to NRMSE relative differences, 6 out of 9 test functions by using LHS and 7 out of 9 test functions by using RND and HSS have a positive percentage, which clearly reveals the advantage of the new approach over R B F p o s . This advantage is more recognizable for NMAE, with 8 test functions with a positive relative difference values for all sampling techniques. The average rows show approximately 3 % related to RND and LHS and 16 % related to HSS better performance of R B F p r i in NRMSE values regardless of the dimension of the test functions, while this superiority is around 16 %, 13 % and 23 % when considering the NMAE for RND, LHS and HSS respectively. This difference, demonstrates the leverage of R B F p r i in predicting the local deviations of functions, which is provided by MAE metric. On the other hand, the superiority of R B F p r i in measuring the global error, by using RND and LHS, is minor compared to the R B F p o s approach.

Fig. 6
figure 6

Test function 1: Branin function (a) actual function; (b) R B F p r i ; (c) R B F p o s

Fig. 7
figure 7

Test function 2: Goldstein-Price function (a) actual function; (b) R B F p r i ; (c) R B F p o s

Fig. 8
figure 8

Test function 3: Rastrigin function (a) actual function; (b) R B F p r i ; (c) R B F p o s

Fig. 9
figure 9

Test function 4: Three-Hump Camel function (a) actual function; (b) R B F p r i ; (c) R B F p o s

Table 8 Overall accuracy performance of R B F p r i over R B F p o s

7 Optimization examples

RBF is a most attractive choice for surrogate models in metamodel based design optimization. This is demonstrated here by studying two examples using our approach of RBF with a priori bias. We begin our study with the following non-linear example:

$$ \left\{\begin{array}{l} \displaystyle{\min_{x_{i}} \, \sqrt{1000\left( \frac{4}{x_{1}}-2\right)^{4}+1000\left( \frac{4}{x_{2}}-2\right)^{4}} } \\ \text{s.t. } (x_{1}-0.5)^{4}+(x_{2}-0.5)^{4}-2 \leq 0. \end{array}\right. $$
(37)

The analytical optimal solution is (1.5,1.5) and the minimum of the unconstrained objective function is found at (2,2). The objective function is plotted in Fig. 10.

Fig. 10
figure 10

Analytical “black-box” function (a) contour plot; (b) four successive iterations generating 12 sample points; (c) contour plot of the RBF of the objective for the DoE with 12 sample points; (d) contour plot of the augmented DoE with 12+3 sample points. The three augmented points are marked with a cross

The problem in (37) is now solved by performing a DoE procedure and setting up corresponding RBFs which in turn define a new optimization problem that is solved using a global search with a genetic algorithm and a local search with sequential linear and/or quadratic programming. First, a set of sampling points are generated by successive linear response surface optimization of the problem in (37) using four successive iterations with automatic panning and zooming (Gustafsson and Strömberg 2008). This screening generates 12 sampling points according to Fig. 10. Then, RBFs are fitted to this DoE and an optimal point is identified. The DoE is then augmented with this optimal point and the RBFs are set up again. This procedure is repeated three times generating in total a DoE with 12 sampling points from screening and three optimal points from RBFs. Finally, meta model based design optimization using our RBFs for this DoE of 12+3 sampling points is performed. The optimal solution generated with this procedure is (1.4962,1.5049), which is very close to the analytical optimum of (37).

The DoEs presented in Fig. 3 are also studied for this example. The corresponding RBFs are set up and the optimal solutions for the random, LHS and HSS DoEs are obtained as (1.7842,1.8003), (1.5618,1.5076) and (1.5698,1.553), respectively. It is clear that not only the choice of meta model will influence the result but also the choice of DoE. The solution from the random DoE is poor. The solutions from LHS and HSS DoEs are similar and acceptable. Thus, the successive screening procedure and optimal augmentation for generating DoE is for this problem superior and performs best. This is a general observation we have found for many examples. We have also used this strategy in order to solve reliability based design optimization problems using meta models. This is discussed in a most recent paper by Strömberg (2016), where also this first example is formulated as an reliability based design optimization (RBDO) problem and is solved for variables with non-Gaussian distributions using a SORM-based RBDO approach. We also consider the following well-known engineering benchmark of a welded beam:

$$ \left\{\begin{array}{l} \displaystyle{ \min_{x_{i}} \, 1.10471{x_{1}^{2}}x_{2}+0.04811x_{3}x_{4}(14+x_{2}) }\\ \text{s.t. } \left\{ \begin{array}{l} \tau(x_{i})-13600\leq 0 \\ \sigma(x_{i}) - 30000\leq 0\\ x_{1}-x_{4} \leq 0\\ 0.125 -x_{1} \leq0\\ \delta(x_{i}) -0.25 \leq0\\ 6000-P_{c}(x_{i}) \leq 0, \end{array}\right. \end{array}\right. $$
(38)

where definitions of shear stress τ(x i ), normal stress σ(x i ), displacement δ(x i ) and critical force P c (x i ) can be found in e.g. the recent paper by Garg (2014), where also several solutions obtained by different algorithms are presented. In addition, the variables are bounded by 0.1 ≤ x 1, x 4 ≤ 2 and 0.1 ≤ x 3, x 4 ≤ 10. We obtain the following analytical solution (0.24437, 6.2175, 8.2915, 0.24437), which is more or less identical to the solution obtained by Garg: (0.24436, 6.2177, 8.2916, 0.24437).

Now, we solve this problem instead by generating a set of sampling points for which RBFs are fitted and then optimized. The procedure is similar to the one presented above. First, 45 sampling points are generated by quadratic response surface screening. This set of points are presented in Fig. 11. The choice of quadratic instead of linear screening depends on the non-linear constraint domain. Linear screening might result in an empty feasible domain. After screening, 15 additional points are added which are generated by optimum from sequentially augmented RBFs. Finally, for 45+15 sampling points, we set up the corresponding RBFs and we obtain the following optimal solution (0.414710, 3.925900, 6.620100, 0.414710). This solution satisfies almost all constraints in (38). The first constraints is slightly violated τ(x i ) = 13729 > 13600, but the five other constraints are fully satisfied. The value of the cost function is 3.113586, which is very close to the analytical optimum value of 2.381. This solution could of course be improved further by augmenting the DoE with additional optimal points.

Fig. 11
figure 11

Successive quadratic response surface screening generating 45 sampling points. The plots are showing the same DoE from two different views

8 Concluding remarks

In this paper, a new approach for setting up radial basis functions network is proposed by letting the bias be defined a priori by a corresponding regression model. Our new approach is compared with the established treatment of RBF, where the bias is obtained by using extra orthogonality constraints. It is numerically proven that our approach with a priori bias is in general as good as the performance of RBF with a posteriori bias. In addition, we mean that our approach is easier to set up and interpret. It is clear that the bias capture the global behavior and the radial basis functions tune the local response. It is also demonstrated that our RBF with a priori bias performs excellent in metamodel based design optimization and it captures both coarse and dense sampling densities simultaneously of DoEs generated from successive screening and optimal augmentation most accurately. In conclusion, the paper shows that our new RBF approach with a priori bias is a most attractive choice for surrogate model. We believe that our approach has a promising potential and opens up new possibilities for surrogate modelling in optimization, which we hope to be able to explore in a near future.