1 Introduction

Optimization of systems has been dealt with in different ways depending on the problem type. The field of multidisciplinary design optimization has evolved at a fast rate in the past decades with the development of various methods and techniques to address optimization of systems with many components or disciplines. In multidisciplinary design optimization the primary focus has been on optimization of systems involving multiple interactions between disciplines which are coupled with one another via coupling variables. Various techniques have been developed that aid in maintaining consistency between these disciplines during the optimization process (Martins and Lambe 2013).

A majority of systems falls under the category of a problem where multiple components have complex interactions with one another. However, a subset of engineering and non-engineering systems consists of components that are not interdependent. Such systems may contain several components but the response of these components can be evaluated independently of one another. In other words, the bi-level problem has a hierarchical structure where, at the lower level, each component in the system is only a function of design variables and there is no exchange of coupling variables between components. In other words, the input-output relation of a component is not influenced by the response of any other components. For this type of problem there is no requirement for maintaining consistency between the different disciplines or components and the system optimization process is therefore much simpler.

In addition, components could be expensive to evaluate but the system transformation may be cheap. Systems of this type are common in the field of integrated optics (Lifante 2003). The response of integrated optical components such as power splitters, directional couplers and phase shifters can be obtained independently of other components. Finding the component response usually requires computationally expensive electromagnetic simulations. Multiple such components can be used to develop complex integrated optical systems. An example of such a system is an integrated optical serial ring resonator (Ahmed et al. 2011).

This work addresses optimization of deterministic hierarchical systems based on independent components that are expensive to evaluate. It is assumed that, once the component responses are available, the system response is cheap to compute. In order to optimize the system we construct response surfaces of the underlying components. The accuracy of the response surfaces depends on the number of samples and their locations. The challenge is to perform the optimization with a desired accuracy at very low computational cost. The response surfaces are built using Kriging (Sacks et al. 1989). Kriging provides an estimator of the Mean Squared Error (MSE) in its interpolation. This MSE was used by Jones et al. (1998) to develop the Expected Improvement (EI) criterion in Efficient Global Optimization (EGO). The EI criterion enables adaptive sampling of unconstrained problems in order to estimate the optimum using relatively few expensive simulations.

We develop a system level EI criterion that is derived from a system level MSE. The system MSE is found by performing a linear transformation of the component level MSE generated by each component metamodel. The system level EI suggests a potential location to be adaptively sampled at each iteration of the algorithm and the component metamodels are updated at the corresponding location. The iterative update of the component metamodels results in a higher fidelity system response with each passing iteration. The process enables the optimum of the system to be estimated using relatively few component simulation calls.

Metamodels have often been used in multidisciplinary optimization (Viana et al. 2014). Improving systems by adaptively sampling component metamodels is also a fairly well known concept (Li et al. 2006; Hu et al. 2011). Optimization of multilevel decomposed systems that have a hierarchical structure has also received attention (Kokkolaras et al. 2004). Similarly, (Barton 1997) employed different strategies for component metamodel construction and update for system level optimization. Research has also been previously performed on partitioning expensive problems with a large number of variables so that they can be treated using component metamodels in a multilevel system framework (Koch et al. 2000), Haftka2005. On the other hand, (Xiu et al. 2013) specifically studied the quantification of error at system level due to component level metamodels in a hierarchical system.

This work also aims to optimize a multi-level problem with a hierarchical structure. The bi-level problem consists of a cheap system transformation at the upper level and expensive component models at the lower level. To our knowledge, there has been no previous work on a system level expected improvement based sampling strategy for optimization of systems involving expensive to simulate components that are independent of each other. The proposed approach can efficiently optimize such systems, provided the number of component variables is so small that high fidelity component metamodels can be built using a reasonable number of simulations.

The method is especially relevant for systems in which the component behavior is easier to approximate than system behavior. Also, it may be advantageous to employ the technique in situations where many similar components are present in the system since a single metamodel can then often replace multiple components. Furthermore, the approach is attractive for fields in which the systems are typically composed of a selection of a small set of components.

The paper is organized as follows. The system level optimization problem of systems with independent components is described in Section 2. In Section 3, we introduce Kriging and EGO and then present the system level EI approach for adaptive sampling of the system. The algorithm is tested on several well known numerical test problems and the result is analyzed in Section 4. The application of the algorithm on a practical engineering problem is shown in Section 5. Finally, Section 6 contains the conclusions and suggestions for future work.

2 Optimization of systems with independent components

We introduce unconstrained optimization of the system in this section, but the method also applies to constrained problems. The constrained problem will be discussed in Section 3.1.3.

Let S represent the response of a system of a set of \(\mathcal {N}\) uncoupled components c. Each component c i , where \(\left \{{i} |i \in \mathbb {N}, i \leq \mathcal {N} \right \}\), is a function of design variables \(\mathbf {x}_{i} \in \mathbb {X}_i \subseteq \mathbb {X}\). Since design variables could potentially be shared across components, the sets \(\mathbb {X}_{1}\) to \(\mathbb {X}_{\mathcal {N}}\) may or may not be disjoint. In addition, some design variables \(\mathbf {x}_s \in \mathbb {X}_{s}\) may only be present at system level, or may be present at both component and system level. The union of sets \(\mathbb {X}_i \ \forall \ i \in \mathcal {N}\) and \( \mathbb {X}_{s}\) is the set \(\mathbb {X}\). We define design variables that are only present in a single component as local variables. Design variables that are present in multiple component or at both component and system level are referred to as global variables. The optimization of the system response can be written as,

$$ \mathrm{\underset{\mathbf{x} \in \mathbb{X}} {min}} \quad S(c_{1}(\mathbf{x}_{1}),c_{2}(\mathbf{x}_{2}),c_{3}(\mathbf{x}_{3}),\dots, c_{\mathcal{N}}(\mathbf{x}_{\mathcal{N}}),\mathbf{x}_s). $$
(1)

If the system response is based on expensive to simulate components, then applying optimization directly on the system response is prohibitively costly. A more viable option instead is to construct lower dimensional (i.e. cheaper) metamodels \(\mathcal {K}_{ci}\) of the components and to apply optimization on the resulting system,

$$ \mathrm{\underset{\mathbf{x} \in \mathbb{X}} {min}} \quad S_{\mathcal{K}}(\mathcal{K}_{c1}(\mathbf{x}_{1}),\mathcal{K}_{c2}(\mathbf{x}_{2}),\dots, \mathcal{K}_{c\mathcal{N}}(\mathbf{x}_{\mathcal{N}}),\mathbf{x}_s). $$
(2)

The optimization problem in (2) is visualized in Fig. 1a. In this work, we construct the metamodels using Kriging which provides an estimator of the error in its interpolation. We then adaptively sample the component metamodels in such a way that the system optimum is found using only a limited number of expensive simulations of the components.

Fig. 1
figure 1

a shows the process of optimization of the system based on metamodels of the independent components is shown. In b, a type of problem in which dependence exists between the underlying components of the system is illustrated

The fidelity of the system response that is based on component metamodels will heavily depend on the error of the underlying component response surfaces. Furthermore, the relative amount of system error is also governed by the operation that is performed on the component responses in order to arrive at the system response. This operation could turn a small component error into a large error contribution on the system level and vice versa. Therefore, it is important that any adaptive sampling of the components takes into account the error generated at system level. In the following section we show how such an adaptive sampling scheme can be developed.

3 Infill sampling criterion for system level optimization

3.1 Kriging and efficient global optimization

3.1.1 Kriging

Kriging is a metamodelling method that constructs the best linear unbiased predictor through a set of responses based on certain statistical assumptions. The most important of these assumptions is that the function responses (outputs) at two input locations (points) are more correlated, the closer these points are. Kriging is also a popular method for constructing response surfaces of fully deterministic simulation results (Forrester et al. 2008). Before introducing our new method in Section 3.2, we provide a condensed explanation of the main steps in Kriging metamodel construction. Some equations are included in the Appendix, for reference. Sacks et al. (1989) provide a full description of Kriging.

Kriging uses a tunable basis function which usually takes the form of a parameterized Gaussian correlation function, (3), to measure the correlation between the outputs at sample points. This correlation between these outputs is given by the following expression

$$ \text{Corr} \left[ Y({\mathbf{x}_{i}}), Y({\mathbf{x}_{j}}) \right] = \exp\left( -\sum\limits_{q=1}^{k} \theta_{q}\left( {{x}_{iq}} - {{x}_{jq}} \right)^{2}\right) $$
(3)

where x i and x j are any two locations in the domain and k is the number of dimensions and 𝜃 q , μ, σ 2 are the parameters to be estimated via Maximum Likelihood Estimation (MLE).

The Kriging prediction \(\hat {y}\) at a previously unsampled location then uses the MLE of the Kriging parameters. The Kriging prediction \(\hat {y}\) is approximated by

$$ \hat{y}(\mathbf{x}) = \hat{\mu} + {\hat{\mathbf{r}}^{\mathrm{T}}}{\hat{\mathbf{R}}^{-1}}(\mathbf{y}-\mathbf{1}\hat{\mu}) $$
(4)

where \(\hat {\mu }\) is the estimated mean, \({\hat {\mathbf {R}}}\) is the N × N estimated correlation matrix between the N samples, \(\hat {\mathbf {r}}\) is the vector of estimated correlations between the observed data and the new prediction and y is the observed response.

An important aspect of Kriging is that a potential error in the interpolation can be estimated. The MSE estimate in the Kriging prediction is approximated by the following equation, which ignores the noise created when the unknown Kriging parameters are estimated,

$$ {s}^{2}(\mathbf{x}) = \hat{\sigma}^{2} \left[ 1 - {\hat{\mathbf{r}}^{\mathrm{T}}}{\hat{\mathbf{R}}^{-1}}{\hat{\mathbf{r}}} + \frac{1-\mathbf{1^{\mathrm{T}}}{\hat{\mathbf{R}}^{-1}}{\hat{\mathbf{r}}}}{\mathbf{1^{\mathrm{T}}}{\hat{\mathbf{R}}^{-1}}\mathbf{1}} \right], $$
(5)

where \(\hat {\sigma }^{2}\) is the MLE of \(\hat {\sigma ^{2}}\). The error increases as the distance between the new point and the old points increases (if the new point coincides with an old point, then the error is exactly zero).

3.1.2 Efficient global optimization

The combination of the Kriging prediction \(\hat {y}(\mathbf {x})\) and MSE s 2(x) can be used to adaptively sample the design domain in order to quickly estimate the global minimum (Jones et al. 1998). One of the most effective Kriging based adaptive sampling criteria is that of Expected Improvement (EI). EI is formulated by exploiting the fact that Kriging assumes a Gaussian Process, where the marginal distribution of \(\hat {y}(\mathbf {x})\) is normal An improvement can be made over the current observed minimum y min at x if some part of Y(x) is below y min. The Probability of Improvement (PI) over y min may be expressed as,

$$ P[I(\mathbf{x})] = \mathrm{\Phi} \left( \frac{y_{\min}-\hat{y}}{s} \right), $$
(6)

where s is the square root of s 2 in (5) and Φ(.) is the normal cumulative distribution function. A new sampling location that gives the highest probability of improvement over y min can be found by maximizing (6). The use of this infill sampling criterion will be discussed in more detail in Section 3.1.3, where it finds its use for dealing with constrained problems.

An alternative method of infill sampling involves finding the expectation of the amount of improvement over y min. The EI can be calculated by taking the expectation E[I(x)] = E[max(y minY,0)]. A computationally cheap analytical expression for EI can be found in terms of the normal cumulative distribution function Φ(.) and the normal probability density function ϕ(.) (Schonlau and Welch 1996). Under the condition that \(\hat {y}<y_{\min }\), the EI over the minimum observed response at any location x is

$$ E[I(\mathbf{x})] = (y_{\min}-\hat{y})\mathrm{\Phi} \left( \frac{y_{\min}-\hat{y}}{s} \right)+s\phi\left( \frac{y_{\min}-\hat{y}}{s}\right). $$
(7)

A new sampling location that gives the maximum expected improvement can be found by estimating the maximizer of (7).

The adaptive sampling strategy can be explained as follows An initial metamodel \(\mathcal {K}_{f}\) is constructed using n samples chosen via Design of Experiments, e.g. Latin Hypercube sampling (LHS). Equation 7 is then maximized to estimate the new adaptive sampling location x n e w . The sampling location and response is added to the set of samples and responses. The metamodel is constructed again with the augmented set of samples and responses until either EImax, the maximum EI at the last iteration, becomes less than 𝜖 E I or the computational budget N T is exhausted. The algorithm is referred to as Efficient Global Optimization (EGO) (Jones et al. 1998). This EGO algorithm forms the main inspiration for the present work.

3.1.3 Optimization of constrained problems

An infill sampling strategy based on Kriging for computationally expensive constrained problems was first suggested by Schonlau (1997). The method involved building metamodels of the objective and constraints and iteratively improving both sets of metamodels via a single adaptive sampling scheme. This infill sampling criterion was simply a product of probability of improvement in the objective and a probability of feasibility for the constraint.

The probability of feasibility measure operates in much the same way as the probability of improvement for the objective. Metamodels are built for the expensive to evaluate constraint(s). Let the Kriging prediction and MSE for such a constraint metamodel be given by \(\hat {h}(\mathbf {x})\) and s 2,(h)(x). The metamodel uncertainty for each constraint metamodel is again described in terms of a normally distributed random variable H(x) with mean \(\hat {h}(\mathbf {x})\) and variance s 2,(h)(x).

Let h min be the constraint limit, so \(\hat {h}(\mathbf {x}) <h_{{min}}\) (standard notation uses h m i n = 0, so the constrained problem at x remains feasible as long as the constraint is less than or equal to zero; for an example we refer to (30).

The probability of feasibility is given by the area of the distribution H(x) that is below the constraint limit h min,

$$ P[F(\mathbf{x}) < h_{min}] = \mathrm{\Phi} \left( \frac{h_{{min}}-\hat{h}}{s^{(h)}} \right). $$
(8)

A possible choice for a new feasible sampling criterion for a constrained problem can then be the product of the probability of improvement of the objective and the probability of feasibility of constraints. For a problem with a single constraint this may be written as,

$$ PI_{F} = P[I(\mathbf{x})]P[F(\mathbf{x})<h_{min}] $$
(9)

Alternatively, the EI in the objective can be multiplied by the product of the probability of feasibility of the constraints,

$$ EI_{F} = E[I(\mathbf{x})]P[F(\mathbf{x})<h_{min}]. $$
(10)

3.2 System level EI criterion

The aim of this work is to optimize (2) by means of building and iteratively improving a set of component metamodels such that the optimum of the original system problem, (1), is estimated. An additional goal is to solve system problems involving constraints. Changes required in the iterative sampling strategy to accommodate this goal are discussed in Section 3.3.

The algorithm is initialized by building Kriging metamodels of all components based on a set of initial samples. In a software implementation, these could also be pre-built library metamodels. The MSE of the individual component metamodels, denoted as \({s^{2}_{i}}(\mathbf {x})\), follows from (5). Let the system level response be given by

$$ \hat{y}_{\text{sys}} = S_{\mathcal{K}}(\hat{y}_{1}(\mathbf{x}_{1}),\hat{y}_{2}(\mathbf{x}_{2}),\dots, \hat{y}_{\mathcal{N}}(\mathbf{x}_{\mathcal{N}}),x_{s}), $$
(11)

where \(\hat {y}_{i}\) is the Kriging prediction of each component \(\mathcal {K}_{ci}\).

In order to update the metamodels in regions of interest for system optimization, an adaptive sampling strategy at system level is required. To derive such a system infill sampling criterion, an error estimate is needed of the system level response, (11).

The EI criterion in Section 3.1 was found by assuming that the uncertainty in the predicted value \(\hat {y}(\mathbf {x})\) at a position x can be described in terms of a normally distributed random variable Y(x) with mean given by the Kriging prediction \(\hat {y}(\mathbf {x})\) and variance given by the Kriging MSE s 2(x). Following the same process, we assume that the uncertainty in the system level response \(\hat {y}_{\text {sys}}\) at a position x can be expressed in terms of a random variable Y s y s (x) with mean \(\hat {y}_{\text {sys}}\) and variance \(s_{sys}^{2}\). Furthermore, we retain the assumption that the uncertainty in the predicted value of each component K c i can be described as a normally distributed random variable Y i (x) with mean given by the Kriging prediction \(\hat {y_{i}}\) and variance given by the Kriging MSE \({s}^{2}_{i}\).

The random variable describing system level uncertainty, Y s y s (x), can basically be described in terms of the system operation on the component level random variables \(\mathbf {Y} = [{Y}_{1},{Y}_{2},\dots ,{Y}_{\mathcal {N}}]\) and the deterministic system variable x s ,

$$ Y_{sys}(\mathbf{x}) = S_{\mathcal{K}}({Y}_{1}(\mathbf{x}_{1}),{Y}_{2}(\mathbf{x}_{2}),\dots,{Y}_{\mathcal{N}}(\mathbf{x}_{\mathcal{N}}),\mathbf{x}_s), $$
(12)

where the normal random variables \({Y}_{1}(\mathbf {x}_{1}),{Y}_{2}(\mathbf {x}_{2}),\dots ,{Y}_{\mathcal {N}}(\mathbf {x}_{\mathcal {N}})\) are treated as being independent. Theoretically, these variables may be assumed to be dependent, but in practice this dependence may be ignored (Kleijnen and Mehdad 2014).

A linear approximation of the right hand side of (12) can be derived by performing the Taylor series expansion of \(S_{\mathcal {K}}(\mathbf {Y},\mathbf {x}_s)\) about the mean values \(\hat {y_{i}}\) of Y and truncating the series to include only the linear terms, i.e. the first two terms,

$$\begin{array}{@{}rcl@{}} {S_{\mathcal{K}}}(\mathbf{Y},\mathbf{x}_{s}) &=& {S_{\mathcal{K}}}({\hat{y}}_{1},{\hat{y}}_{2},\dots, {\hat{y}}_{\mathcal{N}}, \mathbf{x}_{s})\\ && + \sum\limits_{i=1}^{\mathcal{N}}(Y_{i}-{\hat{y}}_{i})\frac{\partial {S_{\mathcal{K}}}}{\partial {Y_{i}}}\left|{~}_{\hat{y}_{i}}\right. . \end{array} $$
(13)

Since the above expansion is linear, Y s y s is a normal random variable (Taboga 2012). It can be shown (Ayyub 2003) that the first order mean of Y s y s is given by

$$ E(Y_{sys}) = \hat{y}_{\text{sys}} = S_{\mathcal{K}}[\hat{y}_{1}, \hat{y}_{2}, \dots, \hat{y}_{\mathcal{N}},\mathbf{x}_s] $$
(14)

and the first order variance of Y s y s is given by

$$\begin{array}{@{}rcl@{}} \text{Var}(Y_{sys}) = s_{sys}^{2} &=& \sum\limits_{i=1}^{\mathcal{N}} {b_{i}^{2}} {s_{i}^{2}}, \\ \text{ where } \quad b_{i} &=& \frac{\partial S_{\mathcal{K}}}{\partial Y_{i}}\left|{~}_{\hat{y}_{i}}\right.. \end{array} $$
(15)

The derivatives b i can be computed cheaply using finite difference in case of black-box system functions, or analytically otherwise. This is based on the assumption that the components are expensive to evaluate, but the system is cheap. Since Y s y s is normally distributed, an analytical expression similar to EI, (7), can be derived for the system level EI as well.

Let the optimum on the system response, (2), be denoted by \(d_{\mathcal {K}}\). We can improve on the value \(d_{\mathcal {K}}\) if \(Y_{sys} < d_{\mathcal {K}}\). The expectation of this improvement, \( I_{sys} = \max (d_{\mathcal {K}}-Y_{sys},0)\), can be found by

$$ \underbrace{E[I_{sys}(\mathbf{x})]}_{\text{EI}_{sys}} = {\int}_{I_{sys} = 0}^{I_{sys}= \infty} I_{sys} \frac{\exp \left( - \frac{t_{sys}^{2}}{2} \right)}{\sqrt{2\pi}s_{sys}} dI_{sys}, $$
(16)

where

$$ t_{sys} = \frac{d_{\mathcal{K}} - I_{sys}-\hat{y}_{\text{sys}}}{s_{sys}}. $$
(17)

The standard normal probability density function is given by

$$ \phi(z) = \frac{1}{\sqrt{2\pi}} \exp \left( \frac{-z^{2}}{2} \right). $$
(18)

Plugging in (18) into (16), EI s y s can be written as,

$$\begin{array}{@{}rcl@{}} E[I_{sys}(\mathbf{x})] &=& (d_{\mathcal{K}}-\hat{y}_{\text{sys}}) {\int}_{t_{sys} = -\infty}^{t_{sys} = \frac{d_{\mathcal{K}}-\hat{y}_{\text{sys}}}{s_{sys}}} \phi(t_{sys})dt_{sys}\\ &&- s_{sys} {\int}_{t_{sys} = -\infty}^{t_{sys}= \frac{d_{\mathcal{K}}-\hat{y}_{\text{sys}}}{s_{sys}}} t_{sys} \phi(t_{sys}) dt_{sys}. \end{array} $$
(19)

The first integral in (19) can be recognized as the normal cumulative distribution function \(\mathrm {\Phi }(\frac {d_{\mathcal {K}}-\hat {y}_{\text {sys}}}{s_{sys}})\). The second integral can be solved by using the substitution \(z=\frac {-t_{sys}^{2}}{2}\). Under the condition that \(\hat {y}_{\text {sys}}<d_{\mathcal {K}}\), the final analytical expression for the infill sampling criterion at system level, EI s y s , is

$$\begin{array}{@{}rcl@{}} E[I_{sys}(\mathbf{x})] &=& (d_{\mathcal{K}}-\hat{y}_{\text{sys}}) \mathrm{\Phi} \left( \frac{d_{\mathcal{K}}-\hat{y}_{\text{sys}}}{s_{sys}} \right)\\ &&+ s_{sys}\phi\left( \frac{d_{\mathcal{K}}-\hat{y}_{\text{sys}}}{s_{sys}}\right). \end{array} $$
(20)

The location of the global maximum of (20) gives the next infill sampling location x n e w .

The algorithm is referred to as Bilevel Efficient Global Optimization (BEGO). We illustrate the main idea of the algorithm via the following illustrative example,

$$\begin{array}{@{}rcl@{}} S(\mathbf{c}(x)) \!&=&\! \sin(c_{1}(x))+\cos(c_{2}(x)),\\ \text{ where } &\quad& c_{1}(x)\,=\, 10x^{4}, \ c_{2}(x) = 10(1-x)^{3}, \ x \in [0,1]. \\ \end{array} $$
(21)

The system response is a function of two components that in turn are a function of the same variable x. Figure 2 shows the plot of S(c(x)), c 1(x) and c 2(x). The system response has three local minima, two of which are very close in terms of their objective value.

Fig. 2
figure 2

The components c 1(x) and c 2(x) and the system response S(c(x)) are plotted in the domain x ∈ [0,1]

We construct Kriging metamodels of the components c 1(x) and c 2(x) based on a set of initial samples. The true component and system responses are treated as black-box. The system response based on metamodels may be expressed as,

$$\begin{array}{@{}rcl@{}} S_{\mathcal{K}}(\mathcal{K}_{\mathbf{c}}(x)) &=&\sin(\mathcal{K}_{c1}(x))+\cos(\mathcal{K}_{c2}(x)),\\ && x \in [0,1]. \end{array} $$
(22)

Figure 3 shows the Kriging metamodels, \(\mathcal {K}_{c1}\) and \(\mathcal {K}_{c2}\), of the components c 1 and c 2 along with the system response \(S_{\mathcal {K}}\) and the system level EI, EI s y s . The metamodels have been initialized with three samples 0.25, 0.5 and 0.75. The MSE for \(\mathcal {K}_{c1}\) and \(\mathcal {K}_{c2}\) and the error estimate for \(S_{\mathcal {K}}\) are also shown in plots (a), (b) and (c), respectively. The error estimate for \(S_{\mathcal {K}}\) is constructed from the individual errors of \(\mathcal {K}_{c1}\) and \(\mathcal {K}_{c2}\), by employing the linear transformation, (15), at each value of x. As expected, the error estimates are zero at the sample locations in plots (a), (b) and (c). Similarly, the plot of EI s y s shows that no improvement is expected at locations that have already been sampled. EI s y s is maximum at x=0 and this would be chosen as the new infill sampling location for this iteration of the algorithm.

Fig. 3
figure 3

The Kriging metamodel of components c 1(x) and c 2(x) and the system response \(S_{\mathcal {K}}\) are plotted along with their respective error estimate in (a), (b) and (c). Plot (d) shows the system level expected improvement EI s y s

Figure 4 shows a flowchart of the main steps involved in system optimization using Kriging based system EI. After constructing the component metamodels in Step 4, the new infill sampling location x n e w is estimated in Step 5. This involves a few substeps. Firstly, the optimum, \(d_{\mathcal {K}}\) on \(S_{\mathcal {K}}\) has to be estimated. Thereafter, the linear transformation of the component level MSE is performed to estimate the system level error, \(s_{sys}^{2}\). The system error estimate \(s_{sys}^{2}\) is used along with the optimum \(d_{\mathcal {K}}\) to evaluate EI s y s in (20). The global maximum of (20) gives the new sampling location x n e w . In Step 6 the component response is evaluated at x n e w . If the stopping criterion has already been reached, then the argument of \(d_{\mathcal {K}}\) (namely, the minimizer of \(S_{\mathcal {K}}\)) is returned as the final system optimum; otherwise, the algorithm returns to Step 4, and the loop is repeated until termination.

Fig. 4
figure 4

Flowchart shows the steps involved in system optimization

3.3 Constrained optimization of systems

If the system involves constraints then the sampling criterion must also be adapted to deal with these constraints. Similar to the formulation of system-level EI above, a probability of feasibility measure at system level can be derived to ensure that samples are added in areas of interest for constrained optimization instead of unconstrained optimization.

Let S h represent the transformation of the component metamodels in order to get the constraint response \(\hat {h}_{sys}\).

$$ \hat{h}_{sys} = S_{h}(\hat{y}_{1}(\mathbf{x}_{1}),\hat{y}_{2}(\mathbf{x}_{2}),\dots, \hat{y}_{\mathcal{N}}(\mathbf{x}_{\mathcal{N}}),\mathbf{x}_{s}). $$
(23)

The error in the constraint response \(\hat {h}_{sys}\) can be derived in a manner that is completely analogous to the system error, s s y s , found for the objective. The system level variance for the constraint is denoted by s 2,(h). The probability of feasibility, PF s y s , for the constraint S h can be expressed as,

$$ P[F(\mathbf{x}) < h_{min}] = \mathrm{\Phi} \left( \frac{h_{{min}}-\hat{h}_{sys}}{s^{(h)}} \right). $$
(24)

where h m i n is the constraint limit that dictates whether the constraint is feasible or infeasible. The product of the system level EI, EI s y s and the system level probability of feasibility, PF s y s , gives a criterion for sampling a constrained problem. For a single constraint this criterion can simply be expressed as,

$$ \text{EI}_{sf} = \text{EI}_{sys}. \text{PF}_{sys}. $$
(25)

The maximizer of (25) is then sampled before constructing the component metamodels again. If there are multiple constraints, the product of individual probability of feasibility of all constraints will replace PF s y s in (25).

4 Numerical test problems

The algorithm is tested on several test problems to investigate its ability to estimate the optimum of the system accurately and consistently. The performance of the algorithm is compared against treating the entire system as a component and applying optimization using the EGO method (Jones et al. 1998). In the following discussion we refer to the system decomposed into several components with the term DEC, (DEComposed), and the system treated as a single component with the term SAC, where SAC stands for System-As-Component. Similarly, we compare the performance of the approach with an LHS with a fixed sample size based sampling scheme for approximate response construction and optimization of the SAC response.

The advantages of using an adaptive sampling scheme for the decomposed system problem are exhibited by comparing the effectiveness of the method with optimizing a system response for a constrained problem based on component metamodels built via one-shot LHS. A disadvantage of an adaptive sampling scheme such as BEGO is that the user is required to solve a multi-modal optimization problem at every iteration, (20), albeit on a cheap response. The optimization problem is multi-modal because EI = 0 at each sample location that has already been simulated. Various methods for estimating the global optimum on this problem are mentioned in Kleijnen (2015), (p. 268). In this context, it should be noted that we operate under the assumption that the components are so computationally expensive that the internal computational complexity of BEGO is negligible in comparison. If the computational complexity of the underlying components is of the same order as the internal algorithm computation cost then the user may be better served to employ one-shot LHS.

4.1 Algorithm evolution example

Before performing an in-depth analysis, we illustrate the evolution of the algorithm on the one-variable system that consisted of two non-linear components. The problem was introduced in Section 3, (21). The plot of the reference components and system response was given in Fig. 2.

Figure 5 shows five iterations of the algorithm, after it has been initialized with three samples at 0.25, 0.5 and 0.75 within the domain x ∈ [0, 1]. The component metamodels and the system response along with their respective error estimates are given in plot (a), (b) and (c) for each iteration. Plot (d) shows the system EI found at each iteration. Each subsequent iteration updates the component metamodel with x n e w , the location of the maximum value for EI s y s attained in the previous iteration. By the 5th iteration the maximum EI s y s falls to an insignificantly small value and the global optimum of the problem, S=−1, is found at x=0.0196. The error estimates for the components and system response also drop fast. At the 5th iteration the component metamodels and the system response \(S_{\mathcal {K}}\) match the reference in Fig. 2 and the respective error estimates are also very low.

Fig. 5
figure 5

The evolution of the method is shown as new sampling points are added at each iteration. The Kriging metamodel of components c 1(x) and c 2(x) and the system response \(S_{\mathcal {K}}\) are plotted along with their respective error estimate in (a), (b) and (c). Plot (d) shows the system EI, EI s y s

The DEC and SAC algorithms were applied 20 times on the illustrative system problem S 1. Both methods were initialized with two random sampling locations for each replication. The average number of iterations needed by DEC to converge to the optimum was 6.6 while the corresponding number of iterations for SAC was 14.5. In part, this disparity is caused by the difference in nonlinearity of the system and component responses, which require varying amounts of samples to approximate appropriately. To further illustrate this aspect, additional test problems are discussed below.

4.2 Unconstrained test problems

The algorithm is tested on a set of one-dimensional systems based on the following numerical problem,

$$\begin{array}{@{}rcl@{}} S_{p}[\mathbf{c}(x)] &=& \sin(c_{1}(x))+\cos(c_{2}(x)),\\ \quad c_{1}(x)&=& px, \ c_{2}(x) = (75-p)x, \ x \in [0,1], \end{array} $$
(26)

where p changes from p=0 to p=70 in steps of 10. This results in 8 different system problems with different values of the frequency p. Figure 6 shows the number of iterations needed to find the optimum for each of these 8 problems by SAC and DEC. Each problem was initialized with two random sampling locations for both SAC and DEC. In order to ensure that the result is not skewed by fortuitously hitting the optimum, both algorithm are only terminated when the optimal solution is repeated over 2 consecutive iterations. The bar chart shows the average performance over 20 runs for both methods. DEC found the optimum for each problem using less than 5 iterations. On the other hand SAC needed more than 15 iterations in all cases. For all values of p, the functions had multiple local optima that were relatively close to each other, so the problem was relatively difficult to optimize even in one variable.

Fig. 6
figure 6

Comparison of the average number of simulations, based on 20 runs, needed by DEC and SAC to converge to the optimum of the problem. The horizontal axis is the value of the problem parameter p

A question that arises from the clear difference between the performance of DEC and SAC in Fig. 6 is why DEC does so much better than SAC on all problems. The fact is that the metamodels that DEC has to build are very simple linear functions that require only relatively few function calls to fit. On the other hand SAC has to construct the metamodel of the system response which is highly nonlinear. Therefore SAC requires many more iterations to fit the response well enough to estimate the optimum correctly.

To test this argument, we apply the following problem on SAC and DEC,

$$\begin{array}{@{}rcl@{}} S_{q}[\mathbf{c}(x)] &=& c_{1}(x) + c_{2}(x),\\ \quad c_{1}(x)&=& (5(x-0.5))^{2}+20\sin(qx),\\ c_{2}(x) &=& (5(x-0.5))^{2}-20\sin(qx), \ x \in [0,1], \end{array} $$
(27)

where the frequency q changes from q=10 to q=100 in steps of 10. This results in ten problems with differing frequency of the components. The system transformation is linear while the component responses are highly nonlinear. It should be noted however that if SAC is used to optimize (27) then the response is always linear and always the same, no matter what value is chosen for q. This is due to the fact that the sinusoidal parts of the components cancel each other at system level.

Both SAC and DEC are initialized with two random samples. Again, in order to ensure that the algorithm is not terminated when the algorithm accidentally finds the optimum, both algorithms are stopped when the optimum is repeated over two consecutive iterations. Based on an average of 20 runs SAC requires only 5 total function calls to locate the optimum. Figure 7 shows the performance of DEC for different values of q. For all values of q, DEC requires more simulations than SAC to converge to the optimum. But interestingly the difference between the performance of SAC and DEC is not as dramatic in Fig. 7 as was the case for the problems shown in Fig. 6. This example, nevertheless, demonstrates that in certain specific cases, such as the one described above, a system problem with independent components should not be decomposed. The choice of whether to decompose the problem or not, depends on the relative nonlinearity of the system and components, respectively.

Fig. 7
figure 7

The average number of simulations, based on 20 runs, needed by DEC and SAC to converge to the optimum of the problem. The horizontal axis is the value of problem parameter q

The algorithm is next tested on the three-hump camel function, which is an established test problem (Surjanovic and Bingham 2013) for unconstrained optimization in two dimensions,

$$\begin{array}{@{}rcl@{}} f_{camel} &=& 2{x_{1}^{2}}-1.05{x_{1}^{4}}+\frac{{x_{1}^{6}}}{6}+ x_{1} x_{2}+{x_{2}^{2}},\\ &&x_{i} \in [-5,5] \forall i \in \mathcal{N}. \end{array} $$
(28)

In the decomposed system level problem, we assume that there are three expensive components and the rest of the function is cheap to compute. The problem is decomposed and written in terms of the system and components as,

$$\begin{array}{@{}rcl@{}} S_{camel}[\mathbf{c}(\mathbf{x})] &=& 2c_{1}(x_{1})-1.05{c_{1}^{2}}(x_{1})+\frac{{c_{1}^{3}}(x_{1})}{6}\\ &&+c_{2}(x_{1},x_{2})+c_{3}(x_{2}),\\ \quad c_{1}(x_{1}) &=& {x_{1}^{2}}, \ c_{2}(x_{1},x_{2}) = x_{1} x_{2}, \ c_{3}(x_{2}) = {x_{2}^{2}},\\ &&x_{i} \in [-5,5] \forall i \in \mathcal{N}. \end{array} $$
(29)

The proposed algorithm is applied 100 times on the system in (29). The component models are constructed using 4 initial samples based on LHS. The number of expensive simulations is limited to 12. Therefore the algorithm can run for 8 iterations.

The mean performance of the 100 runs of the system level approach is compared against applying optimization on a Kriging metamodel of the original function in (28) which is constructed using 12 expensive simulations chosen via LHS. The Kriging metamodel is also constructed 100 times based on 100 different combinations of LHS samples and the mean performance is analyzed.

Table 1 shows the comparison of the two approaches along with the reference global optimum of the function. The mean and standard deviation of the objective value at the global optimum location found by both methods are shown. The objective values shown have been generated on the original function, (28), as a post-processing step. The mean optimum found by adaptive system sampling approach for the decomposed problem is closer to the reference solution than the one found by LHS. In addition, the standard deviation for the system sampling scheme is also much lower compared to the one found by using LHS. This indicates that performing adaptive system sampling of the decomposed problem in (29) is more efficient than applying LHS on the original problem in (28). However, a comparison of the performance of LHS and the adaptive sampling scheme on a decomposed problem would also be interesting. In the following subsection we show this comparison for the more widely applicable case of a constrained problem.

Table 1 Comparison of adaptive system optimization of decomposed Camel problem, (29), versus application of LHS on the original Camel function, (28). The mean and standard deviation of the global optimum found by each method is shown based on 100 runs

4.3 Constrained test problem

The algorithm is now applied on a constrained benchmark problem. The objective is a modified version of the Rastrigin function (Mühlenbein et al. 1991). The constrained problem is given below,

$$\begin{array}{@{}rcl@{}} f(\mathbf{x}) = 0.01\left( 100 + \sum\limits_{i = 1}^{10}\left[ {x_{i}^{2}}-10 \cos(0.2 \pi x_{i}) \right] \right)^{2}\\ \textrm{s.t.} \ 35 - \sum\limits_{i=1}^{4} x_{i}^{2} \leqslant 0,\\ x_{i} \in [-5.12,5.12] \forall i \in \mathcal{N}. \end{array} $$
(30)

The problem consists of 10 variables. We choose to decompose the problem into a system with four expensive components. Based on this decomposition, the system can be written as,

$$\begin{array}{@{}rcl@{}} S\left[ \mathbf{c}(\mathbf{x}) \right] = 0.01 \left( 100+c_{1}-10c_{2} +c_{3}-10c_{4} \right)^{2}\\ \textrm{s.t.} \ 35 - c_{1} \leq 0,\\ c_{1} = \sum\limits_{i=1}^{4} {x_{i}^{2}}, \ c_{2} = \sum\limits_{i=1}^{4} \cos(0.2 \pi x_{i}),\\ c_{3} = \sum\limits_{i=5}^{10} {x_{i}^{2}}, \ c_{4} = \sum\limits_{i=5}^{10} \cos(0.2 \pi x_{i}),\\ x_{i} \in [-5.12,5.12] \forall i \in \mathcal{N}. \end{array} $$
(31)

The system non-linearly transforms the response of the four non-linear components. The components c 1 and c 2 are a function of four variables which are shared across both the components. On the other hand, c 3 and c 4 are a function of six different variables, which are also shared across these two components. The component c 1 makes an appearance in both the objective and the constraint. The metamodel of c 1 can therefore be used for both the system transformation, \(S_{\mathcal {K}}\), of the objective as well as in the system transformation, S h , of the constraint.

The decomposed problem in (31) is used to compare the performance of BEGO with an LHS based strategy. For this problem, metamodels are made for the component response instead of the system response for the LHS based approach as well. Both methods are run 20 times on the proposed problem, (31), and the mean performance is compared.

In the case of BEGO, the component metamodels are initialized with a different number of samples based on the number of dimensions of the component. The four dimensional components c 1 and c 2 are initialized with 20 samples, while the six dimensional components c 3 and c 4 are initialized with 30 samples. The algorithm is allowed to run for 80 iterations. This means that at termination, 100 samples would have been added to \(\mathcal {K}_{c1}\) and \(\mathcal {K}_{c2}\), while \(\mathcal {K}_{c3}\) and \(\mathcal {K}_{c4}\) would be based on 110 samples.

The LHS based approach is therefore given a total budget of 100 samples for c 1 and c 2 and 110 samples for c 3 and c 4. The system response \(S_{\mathcal {K}}\) based on the LHS based metamodels is optimized and the average result of the 20 runs is analyzed.

We also compare the performance for the case in which both LHS and BEGO are given 25 less expensive simulations. This means that BEGO is terminated after 55 iterations. On the other hand, for LHS \(\mathcal {K}_{c1}\) and \(\mathcal {K}_{c2}\) are constructed based on 75 samples and \(\mathcal {K}_{c3}\) and \(\mathcal {K}_{c4}\) are built using 85 samples.

Figure 8 shows the average global optimum found at each iteration of BEGO, based on 20 runs. The objective is evaluated, as a post-processing step, on the reference system response based on the optimal location found at each iteration of BEGO. The reference global optimum is also found on the reference system. The errorbars indicate the standard deviation around the optimum. After about 60 iterations of BEGO, the algorithm has converged to a value close to the reference global optimum and the standard deviation is also relatively low.

Fig. 8
figure 8

The mean and standard deviation of the global optimum found on each iteration of BEGO based on the 20 runs for the test problem defined in (31)

Table 2 shows the comparison of the statistics of the global optimum found by LHS and BEGO. Once again the objective values given here have been found by evaluating the optima found by each method on the reference system response as a post-processing step. The results indicate that BEGO performs better than LHS in terms of mean closeness of the optimum to the reference solution, given the same number of component evaluations. The standard deviation of BEGO is also lower than for LHS. Interestingly, for LHS with a higher number of samples, 2 out of the 20 optimal locations found result in a constraint violation. On the other hand, the optimal locations of BEGO do not violate the constraint for any of the 20 runs. Comparing the result of the higher number of evaluations (second set of rows in Table 2) with the result for the lower number of evaluations, we note that convergence is steadily taking place for both approaches. The mean and standard deviation for BEGO based on the lower number of evaluations are both better than the corresponding numbers for LHS with a higher number of evaluations.

Table 2 Comparison of the mean and standard deviation of the global optimum found by BEGO and LHS based on 20 runs for small and large number of observations

Using the constrained and unconstrained problems, we have evaluated the performance of BEGO in different settings. The method converges well on the problems and also performs better than a space-filling based strategy. The primary advantage of the adaptive technique lies in reducing the computational budget needed to find the global optimum. Since each extra component simulation significantly increases the time and computational costs of optimization, any amount of reduction in these costs brings significant efficiency improvement. We now test the algorithm in a more practical setting by applying it on an engineering problem.

5 Engineering problem: Optimization of an optical filter

5.1 Background of the problem

We perform system optimization of an integrated photonic serial ring resonator that behaves as an optical filter. Light is confined in a high refractive index SiN core layer which is buried inside a relatively lower refractive index SiO2 cladding layer. The light is guided inside these optical integrated circuits via the principle of total internal reflection. Figure 9 shows the top-view schematic of the system. When light at a certain wavelength is launched into the waveguide (black paths lines) at the ‘In’ port, it partially couples into the adjacent waveguide. This coupled light travels through the first ring section and partially couples again into the second ring section. A portion of this coupled light is dropped at the ‘Drop’ port. The rest of the light exits via the ‘Through’ port.

Fig. 9
figure 9

Top-view schematic of a second order serial optical ring resonator. The cloud encompasses the complete system. The colored boxes, on the other hand, represent the six expensive to evaluate components

The amount of coupling can be varied by changing the gap and the length of the coupling sections (area enclosed by the colored boxes in Fig. 9). This change in coupling gives rise to a change in the optical response at the ‘Drop’ and ‘Through’ port. Different optical filter responses can therefore be generated by varying the coupling ratio of each coupler. The objective in this study is to achieve the band-pass filter response given in Fig. 10 at the ‘Through’ port.

Fig. 10
figure 10

Desired band-pass filter spectral response (red line) of the serial ring resonator. No preference is specified in the intervals [0.1,0.2] and [0.8,0.9]

In order to compute the power coupling ratio of the coupler, two expensive electromagnetic simulations have to be performed. The first simulation computes the power coupling ratio, P L0, when the coupling length is zero. The second simulation computes the beat length, L π , which is defined as the coupling length needed to completely couple light from the first waveguide into the second waveguide and back into the first waveguide. Since the exchange of light between adjacent waveguides follows a sinusoidal curve, the two parameters, P L0 and L π , are enough to compute the coupled power at any coupling length given a certain gap (Syms and Cozens 1992). Figure 11 shows the power coupling ratio as a function of the coupling length based on a certain simulated value of P L0 and L π for a given gap.

Fig. 11
figure 11

Power coupling ratio as a function of the coupler length for a certain coupler gap

Once the power coupling ratio of each coupler is known, the transfer function at the ‘Through’ port can be cheaply computed as a function of the normalized frequency using scattering matrix analysis (Ahmed et al. 2011). Let H(n f ) represent the transfer function for the normalized frequency n f ∈[0 1]. We can then define a system objective for the desired bandpass spectral response as,

$$\begin{array}{@{}rcl@{}} \mathrm{\underset{\mathbf{x} \in \mathbb{X}} {min}} \quad \frac{1-b}{2} \left\| \bar{H}_{stop1} \right\|_{p} &+& b \left[ 1-\left\| 1-\bar{H}_{pass} \right\|_{p} \right]\\ &+&\frac{1-b}{2} \left\| \bar{H}_{stop2} \right\|_{p}, \end{array} $$
(32)

where \(\bar {H}_{stop1}\), \(\bar {H}_{pass}\) and \(\bar {H}_{stop2}\) represent the vector of responses for n f ∈[0 0.1], n f ∈[0.2 0.8] and n f ∈[0.9 1], respectively. The five design variables of the problem, depicted in Fig. 9, are x ∈ [w g 1 g 2 g 3 L], where w ∈ [1 1.15]μ m, L ∈ [0 2400]μ m and all the gaps are in the range [1 1.3]μ m, . If p is large then the objective basically involves minimization of the weighed sum of the maximum value in the two stop bands and the minimum value in the pass-band. The p-norm is used instead of the maximum and minimum value in order to ensure that the objective remains continuously differentiable. The use of the p-norm instead of the minimumum and maximum values enables faster convergence for the algorithm since the objective is relatively easier to model. The weights are based on the value of b. In this work, we choose p=20 and b=0.6. The value of p was chosen such that there is a balance between smoothness of the objective and the need to closely approximate the maximum value.

5.2 System optimization

The system response, S, can be modeled in terms of two expensive components that are repeated three times in the second order resonator since each coupler requires two expensive simulations and there are three couplers in the system. Figure 9 shows the six components c 1 to c 6 as well as the five design variables of the problem [w g 1 g 2 g 3 L]. We treat the responses P L0 and L π , generated from each coupling section, as the components of the problem since this suits the mathematical structure of this problem. The expensive components are only a function of the width and the respective gaps. The components c 1, c 3 and c 5 give the power coupling ratio P L0 for each coupler while the beat length L π for each coupler is found by components c 2, c 4 and c 6.

A commercial electromagnetic solver, PhoeniX Software, is used to compute P L0 and L π . Both simulations require approximately 10 minutes. Initial Kriging metamodels for c 1 and c 2 are built based on n d = 9 samples for w and g 1 chosen via LHS. We assign g 2 and g 3 the same initial sample values as g 1. Since the design domain for all the components is the same we can essentially use \(\mathcal {K}_{c1}\) as the approximate response for P L0 for all three components c 1, c 3 and c 5. Similarly \(\mathcal {K}_{c2}\) can be used to give the beat length L π for components c 2, c 4 and c 6.

Once the component metamodels that predict the value for P L0 and L π for each coupler have been constructed, the following operations to evaluate the system objective are performed at system level. In order to evaluate the objective to be optimized, (32), the power coupling ratio of each coupler is needed. This operation is performed on system level since the value of P L0 and L π can cheaply be predicted by the Kriging metamodels for each coupler. The computation of the transfer function via scattering matrix analysis involves simple operations on small matrices and is therefore also performed on system level. Once the transfer function is known, it has to be plugged into (32) to find the system objective.

A total computational budget of 21 expensive simulations for evaluating P L0 and L π is reserved in addition to the 9 initial simulations. At each iteration of the system algorithm, P L0 and L π is simulated three times for the three different combinations [w g 1], [w g 2] and [w g 3] of the location for infill sampling suggested by the algorithm. The total number of simulations available translates to 7 iterations of the system optimization algorithm since 7 × 3 = 21 simulations. The three new P L0 responses at each iteration are all added to the Kriging metamodel \(\mathcal {K}_{c1}\). On the other hand, \(\mathcal {K}_{c2}\) is augmented with the set of three new beat length responses L π at each iteration.

Figure 12 shows the optimal system response found after 7 iterations of BEGO. The approximate system response \(S_{\mathcal {K}}\) is plotted along with the system response obtained when the optimal result is fed into the expensive component simulators. It can be seen from the figure that the system response based on the component metamodels closely resembles the actual system response at the global optimum. The optimum found by BEGO seems to perform quite well in the pass band region, i.e. almost all the light in the normalized frequency region [0.2 0.8] is being passed through. The amount of power in the stop bands [0 0.1], [0.9 1] is still not very low. This is to be expected since we are optimizing a second order filter, i.e. there are only two rings in the serial resonator structure. Increasing the filter order will further improve the performance.

Fig. 12
figure 12

Plot shows the approximate system response and the reference system response at the global optimum found by BEGO

Table 3 shows the location at which the global optimum of the serial ring resonator is found. Although we do not impose the restriction that g 3 = g 1, the optimal result produces a resonator with symmetric gaps. The numerical objective value for S and \(S_{\mathcal {K}}\) confirm the result shown in Fig. 12. Table 4 gives the response of the individual components at the global optimum. As expected, the Kriging metamodel responses for P L0 and L π for each directional coupler section is close to the reference component response.

Table 3 The location of the global optimum is given along with the objective on the approximate system and the reference system
Table 4 The response of the individual components at the global optimum are given for the metamodels and the reference simulator

6 Conclusion

In this work, we have developed an efficient strategy for global optimization of systems involving independent components. The approach, referred to as BEGO, targets systems involving expensive to evaluate components that do not interact with each other. Kriging metamodels were employed to construct the responses of the expensive components. A novel system level infill sampling strategy was derived in order to sample the components in regions of interest for system optimization. A linear Taylor series approximation of the system transformation of the Kriging components was performed in order to obtain an analytical expression for system level EI. The infill sampling criterion was modified for constrained system optimization by deriving a system level probability of feasibility for each constraint.

The system level optimization approach was first compared with treating the problem as a single component and applying EGO (Jones et al. 1998) for obtaining the optimum. Similarly, the effectiveness of the technique was compared with building a response surface of the system via LHS and globally optimizing the cheap response. Both comparisons exhibited that, in general, there is clear efficiency improvement when optimization is performed on the decomposed problem. However, if the component responses are highly nonlinear and the system response is linear then decomposition may not always result in efficiency improvement.

We demonstrated the advantages of using an adaptive sampling scheme for the decomposed system problem by comparing the effectiveness of the strategy with performing optimization of a system response based on component metamodels constructed using LHS.

The two approaches were applied on a modified and constrained version of the Rastrigin function. Based on an average of 20 runs, BEGO converged comparatively faster to the optimum of the problem as opposed to the LHS based optimization strategy.

In addition, BEGO was applied on an engineering system involving an optical filter based on an integrated photonic serial ring resonator. The engineering problem consisted of a system with six uncoupled components. It was shown that BEGO was able to determine the global optimum of the system problem using only a limited number of expensive component simulations.

The algorithm is especially relevant for problems in which many multiple components are part of the system. In such a situation the same metamodel can be used to construct the response for the multiple components. Another advantage of employing the bilevel framework is that the number of dimensions for each component metamodel is often much lower than the total dimension size at system level. This enables the approximate system response to converge fast to the actual response with each passing component level simulation. Since the component response is treated as a black-box, the method is applicable to any hierarchical system with low dimensional components.

In this work, we have addressed optimization of systems involving independent components. A future aim is to extend the algorithm to the case where the components are dependent. We expect that the extension of the algorithm to systems of dependent components will have to draw upon methods developed within the field of Multidisciplinary Design Optimization (Martins and Lambe 2013).