Metamodel-based multidisciplinary design optimization methods for aerospace system

The design of complex aerospace systems is a multidisciplinary design optimization (MDO) problem involving the interaction of multiple disciplines. However, because of the necessity of evaluating expensive black-box simulations, the enormous computational cost of solving MDO problems in aerospace systems has also become a problem in practice. To resolve this, metamodel-based design optimization techniques have been applied to MDO. With these methods, system models can be rapidly predicted using approximate metamodels to improve the optimization efficiency. This paper presents an overall survey of metamodel-based MDO for aerospace systems. From the perspective of aerospace system design, this paper introduces the fundamental methodology and technology of metamodel-based MDO, including aerospace system MDO problem formulation, metamodeling techniques, state-of-the-art metamodel-based multidisciplinary optimization strategies, and expensive black-box constraint-handling mechanisms. Moreover, various aerospace system examples are presented to illustrate the application of metamodel-based MDOs to practical engineering. The conclusions derived from this work are summarized in the final section of the paper. The survey results are expected to serve as guide and reference for designers involved in metamodel-based MDO in the field of aerospace engineering.


Introduction
In practice, the development of aerospace systems involves sophisticated system engineering. Aerospace systems (e.g., satellites and spacecraft) are multidisciplinary schemes consisting of several inter-coupled disciplines or subsystems, including orbit, power, propulsion, structure, thermal control, attitude control, and payload. Multidisciplinary design optimization (MDO) has been widely employed in engineering systems design practices to improve design performance and reduce design costs in the early development process [1]. The concept of MDO was initially developed by Sobieszczanski-Sobieski [2]. As a methodology, MDO focuses on the design of complex engineering systems and subsystems by coherently exploiting the synergy of mutually interacting phenomena [2]. It comprehensively utilizes various computational analysis tools and optimization methods to determine the optimal design of systems [3]. Moreover, MDO was first applied to the coupled aero-structural optimization problem for aircraft wings [4]. The merits of MDO have been widely proven in the design practices of engineering systems, such as aircraft [5][6][7][8][9][10], automobiles [11][12][13], ships [14,15], and electric devices [16,17]. In the past decade, MDO has also been applied to aerospace systems, including satellites [18][19][20][21], constellations [20,22,23], and launch rockets [24][25][26][27], to improve the overall system performance.
Although deriving the optimal design of multidisciplinary systems is advantageous, the implementation of MDO is confronted with two crucial problems in practical engineering. The basic problem involves the means by which multiple disciplines are to be coupled. The simplest approach is to implement multidisciplinary analysis (MDA) during optimization, where the coupling variables are directly solved via iterative analysis. In addition, several MDO architectures have been successfully developed to organize different analysis models and computational flow for multidisciplinary optimization. These architectures can be generally classified as monolithic (e.g., all-at-once, individual discipline feasible, and multidisciplinary feasible) architectures and distributed (e.g., cooperative optimization and concurrent subspace optimization) architectures [3], with respect to whether sub-optimization is required. Another critical problem involves achieving efficiency in solving MDO problems with limited computational resource. Modern aerospace system designs typically employ high-fidelity disciplinary simulation models (e.g., finite element analysis (FEA) model with large grids) to enhance the quality and reliability of design optimization results. However, the evaluation of expensive simulation models and iterative MDA computing processes further considerably increase the associated computational cost of MDO problems in aerospace systems. Moreover, because of the lack of transparency in simulation models (e.g., simulations implemented using commercial computer-aided engineering (CAE) software), the MDA process is generally referred to as an expensive black-box function, whose gradient is expensive or unreliable for calculation. Thus, most existing heuristic or gradient-based numerical methods (e.g., genetic algorithm and sequential quadratic programming) are unsuitable for solving computation-intensive blackbox optimization problems.
To reduce the computational cost of expensive blackbox optimization problems, metamodel-based design optimization (MBDO) techniques have been developed over the past two decades [28]. These methods are also called surrogate-assisted analysis and optimization methods [29]. They involve the construction of a metamodel (or a surrogate) based on a set of samples to approximate the original expensive simulation models for analysis or optimization. These MBDO methods can also be applied to solve MDO problems. In this case, an MDO problem is essentially treated as a general constrained nonlinear optimization problem, where the computationally intensive MDA process is replaced with metamodels to reduce the number of expensive function calls. In recent years, several state-of-the-art MBDO methods have been developed [30]. Different MBDO methods are distinguished by their metamodeling techniques and design space exploration strategies. The combination of the MBDO and MDO methodologies can be referred to as metamodel-based MDO. Note that in practice, the purpose of metamodel-based MDO is to overcome the burden of optimization rather than build or solve MDO architectures.
A number of reviews regarding MDO have been reported in the literature [1,3,31,32]. Many surveys on metamodel-based design and optimization have also been conducted in recent years [28,29,33]. However, comprehensive surveys on metamodel-based MDOs are rarely performed [34,35]. Focusing on the design of aerospace systems, this paper introduces the fundamental methodology and technology of state-of-the-art metamodel-based MDO techniques. The remainder of this paper is organized as follows. Section 2 introduces the problem formulation and overall architecture of the metamodelbased MDO. The common metamodeling techniques are described in Section 3. In Section 4, the metamodelbased multidisciplinary optimization strategies are discussed in detail. The survey of expensive black-box constraint-handling mechanisms for metamodel-based MDOs is presented in Section 5. Moreover, several realworld aerospace system MDO examples are discussed in Section 6. Finally, some conclusions and future works on metamodel-based MDOs are summarized in Section 7.
MDO was employed to determine the optimal design parameters for the aerospace system.
In MDO, the inter-coupled relationships, data exchange, and simulation flow among different disciplines can be organized in terms of the design structure matrix, as illustrated in Fig. 2. In the matrix, the diagonal elements (i.e., shaded blocks) represent different disciplines involved in the aerospace system. The terms above the diagonal represent feed-forward variables and parameters, whereas those below represent backward information. The design structure matrix explicitly expresses the constitution of the MDO problem. Based on the constructed matrix, the general mathematical formulation of an aerospace system MDO problem can be represented as where f (X) is the objective of the aerospace system MDO problem (e.g., total mass and lifecycle cost); N d is the number of disciplines; X is the vector of design variables; X i and Y i are the design variables and output (i.e., state variables) of the ith discipline, respectively; X LB and X UB are the lower and upper bounds of design variable, respectively; Y ij is the coupling state variables from the ith discipline to the jth discipline; and are the state equations (i.e., disciplinary analysis model) and constraints of the ith discipline, respectively.
Note that because of feedback coupling state variables, Y FV = {Y ij |i > j}, the MDA process must derive a consistent solution at each sample point during the metamodel-based optimization process. This means that the solution (D i (X i , Y i , Y ij ), i = 1, 2, · · · , N d ) must satisfy all state equations. In practice, the MDA process can be organized via a fixed-point-based iteration approach, as summarized in Algorithm 1.

Overall procedure of metamodel-based optimization
Because modern aerospace system design generally involves computationally expensive simulation models, MBDO techniques are employed to solve MDO problems to reduce the computational cost. In the metamodelbased MDO process, the entire MDA process is evaluated to obtain a consistent design at each sample point. The metamodels are constructed to approximate the MDA process for multidisciplinary optimization. In this case, the responses of expensive black-box objectives and constraints can be rapidly and inexpensively predicted by metamodels. Note that metamodels can be gradually refined via adaptive sampling during the optimization process to further reduce the computational cost. Moreover, Fig. 3 illustrates the overall procedure of metamodelbased aerospace system MDO [19]. The major steps are as follows: Step 1: The aerospace system MDO problem, including the design space, objective, constraints, and MDA models to be optimized, is clearly defined. The algorithm parameters (e.g., the number of initial samples and termination criterion) of the selected MBDO methods for solving the MDO problem are configured.
Step 2: A number of sample points are generated by the design of the experiment in the design space. Then, the MDA process is invoked to obtain the objective and constraint responses at each sample point. The sample points and their associated responses (referred to Fig. 3 Overall procedure of metamodel-based MDO for aerospace systems (reproduced with permission from Ref. [19], © IAA 2017).
as samples) are stored in the sample dataset.
Step 3: Based on existing samples in the sample dataset, metamodels are constructed to approximate the objective and constraint responses from the costly MDA process. If the static metamodel-based optimization strategy is used, then the approximation accuracy of metamodels must be verified, as discussed in Section 2.3.
Step 4: The constructed metamodels are used to replace the original expensive MDA process for optimization. Global numerical optimization techniques, such as genetic algorithms, are employed to directly optimize the metamodels instead of the original expensive MDA models, as given by Eq. (2): where m is the number of constraints, andf (X) and g i (X) are the metamodels of the objective and the ith constraint, respectively. If the dynamic metamodel-based optimization strategy is used, the metamodels are adaptively refined during optimization to efficiently search for the optimum, as discussed in detail in Section 4.
Step 5: The termination criterion is checked to determine whether the optimization must be terminated. The common termination criteria in metamodel-based MDO include the decrement criterion (C 1 ) and computational cost criterion (C 2 ) [36], as given in Eq. (3). In C 1 , if the error or relative error between the objective values in two consecutive iterations is less than the tolerance (ε OPT ), then the optimization terminates. In C 2 , if the number of existing samples (N e ) exceeds the predefined maximum number of function evaluations (NFE max ), then the optimization terminates.
Step 6: The optimized design for the aerospace system MDO problem is finally obtained when the termination criterion is reached. For the single-objective optimization problem, the best feasible solution in the sample dataset is outputted as the final solution. For the multi-objective optimization problem, the non-dominated designs in the sample dataset are outputted as a Pareto solution.

Classification of metamodel-based MDO methods
Metamodel-based MDO methods may be generally classified into two categories in terms of whether the metamodels are updated during optimization: static and dynamic metamodel-based optimization strategies ( Fig. 4) [36].
In the static metamodel-based optimization strategy, metamodels are constructed once based on sufficient samples for optimization. This is the most convenient approach to implement the metamodel-based MDO process. For instance, some expensive disciplinary models can be replaced with well-constructed metamodels for MDA and optimization. Note that to ensure the confidence in their results, the approximation accuracy of the constructed metamodels should be verified before optimization. Techniques that are widely used for accuracy validation include split and crossover validation methods. In the split validation method, the samples are divided into two groups. One group of samples is applied to construct the metamodels, and the other is used to validate their accuracy. For the crossover validation method, several samples are randomly selected for validation, whereas the remaining samples are used for metamodeling. The validation process is repeated until the prediction errors in all the samples are obtained. The most commonly used approximation accuracy indices for metamodels are summarized in Table 1 [36]. For example, if the complex correlation coefficient (R 2 ) exceeds 0.9, then the metamodel can be assumed to be sufficiently accurate for optimization.
In contrast to constructing a static metamodel once, the metamodels in the dynamic metamodel-based optimization strategy are adaptively refined according to certain criteria or mechanisms during optimization. Overall, the dynamic metamodel-based optimization strategy can be further divided into infill sampling criterion-based methods and space reduction-based methods. Compared with the static metamodel-based optimization strategy, the dynamic metamodel-based optimization strategy, which has become popular in recent years, is generally more efficient in practice. This study mainly focuses on surveying the dynamic metamodel-based optimization strategy, as discussed in detail in Section 4.
Root mean square error

Diagram of metamodel-based MDO architecture
Based on the foregoing discussion, Fig. 5 illustrates the architecture of the metamodel-based MDO technique [36]. Before optimization, the MDA models of aerospace systems (e.g., orbit, structure, and attitude control) must be established and parameterized. Then, MBDO techniques are employed to optimize the MDO problem by evaluating the MDA process. During the optimization, metamodels are constructed and refined to explore the design space. The optimization stops when the termination criterion is satisfied. The optimized solution is outputted for further analysis.

Design of experiments
The design of experiments aims at generating sample points in the design space to represent the numerical characteristics of the system. Considering the limited computational resource in practice, the design of experimental methods is expected to provide favorable projective uniformity and space-filling uniformity performance. In the metamodel-based aerospace system MDO process, Latin hypercube is the most commonly used design among experimental methods. It can generate sample sets with arbitrary quantities and dimensions.
To further improve the sampling performance, in recent years, optimal Latin hypercube design methods have been developed based on certain criteria, such as minimum distance [37], entropy [38], and energy [39]. For instance, two typical optimal Latin hypercube design methods, that is, the native lhsdesign function in the MATLAB optimization toolbox [40] and ESEA-OLHD [41], are illustrated in Fig. 6. The lhsdesign function is implemented by randomly generating several groups of sample points and selecting the group with the best space-filling ability. Moreover, ESEA-OLHD generates sample points by optimizing the space-filling criteria via numerical optimization algorithms. This shows that the random sample points generated by the optimal Latin hypercube design methods can uniformly fill the design space, thus improving the metamodeling performance in practice. In recent years, some novel OLHD methods specific for metamodelbased optimization have been developed [42][43][44].
After the generation of sample points, the associated responses at each sample point are evaluated by calling the MDA process. The samples are then utilized to construct metamodels for MDO.

Typical metamodels 3.2.1 Polynomial response surface method
The polynomial response surface method constructs a multivariate linear regression function to fit the costly simulation model or MDA process [45]. A polynomial response surface metamodel can be written aŝ  where N v is the dimensionality of design variables; β (0) , β (i) , β (ij) are the coefficients estimated through the least squares method. The coefficient matrix, β, is given by where y is the response vector of training sample points, and Φ is the matrix relevant to the training sample points.

Radial basis functions
Radial basis function is an interpolation method based on the function value at sample points [46]. A radial basis function metamodel can be formulated aŝ where N t is the number of training sample points; ϕ r (∥x − x i ∥), i = 1, 2, · · · , N t , is the radial function; and ω i is the weight coefficient of radial function. The coefficient vector (ω) can be calculated as follows: The typically used radial functions are listed in Table 2 [36], where is the Euclidean distance is ∥x − x i ∥.
The approximate accuracy of this metamodel is influenced by shape coefficient c, which is determined by the Linear distribution of sample points and function information. Typically, c can be estimated using Eq. (8) [36]:

Kriging
The Kriging model is a type of unbiased optimal estimation interpolation model that combines a global approximation model and a stochastic process. It is formulated using Eq. (9):f where µ(x) is the global approximation model, which reflects the variation trend of the expensive MDA process and is usually set as a constant; Z(x) represents a Gaussian process with zero mean and variance (σ 2 ). Given a set of sample points, X = {x 1 , x 2 , · · · , x Nt }, the covariance matrix is given by Eq. (10): where R(·) is a symmetric correlation matrix, and R(x i x j ) is the Gaussian correlation function between sample points x i and x j , as shown in Eq. (11): where θ k is the correlation parameter determined by maximizing the likelihood function in Eq. (12): The estimated variance (σ 2 ) and mean values (μ) can be calculated using Eq. (13): where Y T is the column vector of the responses of sample points; 1 is a 1 × N t unit vector. The prediction value and standard deviation of the unvisited point using the Kriging model are formulated in Eq. (14):

Ensemble method
In general, different metamodels exhibit different approximation performance levels for different problems. To exploit various metamodels, different metamodels can be combined to formulate an ensemble metamodel to improve the approximation capability [47], as shown in Eq. (15):f wheref Ensemble is the ensemble of different metamodels, f i (x) is the ith metamodel, N esm is the number of different metamodels, and α i is the weight of the corresponding metamodel. In practice, the weights (α i ) can be determined with respect to the approximation error of each metamodel [47,48]. If the computational cost is acceptable, the weights can be directly optimized using numerical methods by minimizing the cross-validation error of the ensemble metamodel [43,49].

Machine-learning-based metamodeling method
Because the essence of metamodels in optimization is relatively similar to the regression task in machine learning, some machine learning regression algorithms have been applied to metamodeling in practice. The representative method is support vector regression [50,51]. Support vector regression is based on the principle of support vector machines with slight variations and has proven to be effective in solving regression problems [51]. Artificial neural networks are also common in metamodeling fields [52]. An artificial neural network is a multilayer feedforward network that approximates the targeted nonlinear function; the network parameters are trained via numerical methods (e.g., back-propagation algorithm). As a hierarchical composition of Gaussian process-based metamodels (e.g., Kriging), deep Gaussian processes have attracted considerable interest in recent years owing to their promising nonlinear approximation performance [53][54][55].

Multi-model fusion methods
Most conventional metamodeling techniques and metamodel-based optimization processes simply utilize high-fidelity analysis models that are accurate but computationally intensive. Thus, the computational burden for solving MDO problems remains heavy because of the excessive number of costly samples. However, real-world aerospace system designs generally involve multi-fidelity analysis models, such as structural FEA models with fine or coarse grids and orbital transfer models considering perturbations and eclipses. Low-fidelity analysis models are less accurate but computationally inexpensive. Multimodel fusion methods (or multi-fidelity methods) have been proposed to exploit multi-fidelity analysis models [56][57][58][59][60]. In this approach, a large number of lowfidelity points are generated to capture the trends of system responses, and a small number of high-fidelity points are used to calibrate the trend. The basic form of a multi-model fusion metamodel is given by Eq. (16). In terms of the method for determining the regression (ρ) and discrepancy (δ(x)) items, multi-model fusion methods can be divided into correction-based and Bayesian methods.

Correction-based method
The correction-based method is a straightforward multimodel fusion method, where the metamodel of the lowfidelity model (f LF (x)) and that of the discrepancy (δ(x)) are constructed. The regression item (ρ) is incorporated to minimize the difference between the scaled low-fidelity model approximation (ρf LF (x)) and high-fidelity model response (f HF (x)). Traditionally, ρ is a scalar, but modeling it as a function of x is an area of research [61,62]. Based on Ref. [59], radial basis function and Kriging are the most widely used metamodels in the correction-based multi-model fusion method.

Bayesian method
The Bayesian multi-model fusion method, also known as the Co-Kriging method, was introduced by Kennedy and O'Hagan [63]. Co-Kriging constructs the covariance between high-fidelity and low-fidelity models to import the assistance of the low-fidelity model; in this method, ρ and the hyperparameters ofδ(x) are both estimated. Forrester et al. applied Co-Kriging to the design optimization field by combining it with the Bayesian model update criterion to balance global exploration and local exploitation [56]. In recent decades, Co-Kriging variants have become one of the most popular multi-model fusion methods [59]. Different variants of Co-Kriging have also been proposed to improve the approximation accuracy [64][65][66][67] and reduce the metamodeling cost [66][67][68][69].
To illustrate the concept of multi-model fusion intuitively, a one-dimensional (1D) numerical example is presented. This example has been widely reported in the literature to demonstrate the effects of multi-model fusion methods [56,57,59,66,70]. High-fidelity and low-fidelity models are formulated in Eq. (17). Here, the parameters adopt A = 0.5, B = 10, and C = −5. A Kriging model is constructed using pure high-fidelity samples. A Co-Kriging model is constructed using both high-fidelity and low-fidelity samples. As illustrated in Fig. 7, Kriging inadequately approximates f HF , whereas the Co-Kriging calculation is relatively close to f HF with the support of low-fidelity data. (17) 4 Metamodel-based multidisciplinary optimization strategy

Infill sampling criterion-based optimization strategy
In the dynamic metamodel-based optimization strategy, one widely used method for updating metamodels is to sequentially allocate the newly added sample points in the design space according to a certain infill criterion. In this study, two representative infill sampling criterionbased optimization strategies are introduced: efficient global optimization methods and mode-pursing sampling methods.

Efficient global optimization method
The well-known efficient global optimization algorithm generates infill sample points in the design space [71], where the expected improvement is maximized to balance the exploration and exploitation of optimization. The formulation of the expected improvement is given by Eq. (18): where Φ(·) and ϕ(·) are the Gaussian probability distribution function and probability density function, respectively; y min is the minimum objective function value among existing sample points. As shown in Eq. (18), if the newly added infill sample point is not the same as existing sample points, then EI(x) > 0; otherwise, To illustrate the efficient global optimization process intuitively, a 1D numerical example (Eq. (19)) is investigated.
The optimization process is illustrated in Fig. 8. The figure shows that the sample point with the maximum    EI(x) in each iteration is selected to update the Kriging model. In addition, EI(x) decreases with iteration, indicating the convergence tendency of the optimization process. After the 3rd iteration, the optimization process converges to the global optimum.
To further improve the optimization capacity, some variants of efficient global optimization have been developed in recent years, especially in constraint handling [72], parallel infill sampling [73], and highdimensional optimization [74]. Further details of the variants of efficient global optimization are discussed in Refs. [75][76][77][78].

Mode-pursing sampling methods
The mode-pursing sampling method proposed by Wang et al. is another typical infill-criterion-based optimization strategy, where the infill criterion is implemented based on a constructed cumulative probability function [79]. In the traditional mode-pursing sampling method, the biased cumulative sum of approximated objective values of numerous inexpensive points constructs the cumulative distribution function, as given in Eq. (20): whereĜ(i) is the cumulative distribution function, G(i) is the cumulative sum, G min is the minimum cumulative sum, r(R 2 ) is the bias control factor determined by the complex correlation coefficient (R 2 ) of the metamodel, N cg is the number of inexpensive point groups, and n g is the number of each group. The optimization process of mode-pursing sampling is illustrated in Fig. 9 [79], using a six-hump camel-back problem [80], as formulated in Eq. (21): Fig. 9 Mode-pursing sampling optimization procedure.
In the mode-pursing sampling procedure, numerous inexpensive points are randomly generated in the design space and sorted in ascending order with respect to their metamodel responses,f (x i ), as shown in Fig. 9(a). Then, f (x i ) is subtracted from the maximumf (x i ) to obtain a nonnegative difference ( Fig. 9(b)), which is averaged over a group of n g inexpensive points to obtain G(i) (Fig. 9(c)). Then, G(i) is normalized asĜ(x i ), as shown in Fig. 9(d). Here,Ĝ(i) satisfies the requisites of a cumulative distribution function, such as that monotonically increasing and varying between 0 and 1. New sample points are selected from inexpensive points based onĜ(i), and the selected probability of inexpensive points in the ith group is equal toĜ(i). The control factor, r(R 2 ), biasesĜ(i) toward the inexpensive point groups with smallerf (x i ) values, as illustrated in Fig. 9(e). In other words, the more accurate the metamodel, the more likely inexpensive points (f (x i )) are selected. BecauseĜ(i) is positive throughout the design space, and mode-pursing sampling adopts an elite strategy, the global convergence property can be theoretically proved easily [79,81].
Because of the potential global optimization performance of the mode-pursing sampling framework, a range of variants has been proposed for different optimization problems by customizing the expression or application scenario ofĜ(i). For instance, Kazemi et al. proposed the constraint-importance mode-pursuing sampling method for constrained optimization problems by introducing penalty items intoĜ(i) [82]. Sharif et al. developed discrete variable mode-pursing sampling for discrete variable optimization problems by replacing a continuous objective with a discrete objective and utilizing a doublesphere strategy to improve the local exploitation ability [83]. Shan and Wang proposed a Pareto set pursing method for multi-objective optimization problems by aggregating multiple objectives into a single fitness function that reflects the dominance of sample points [84]. Other variants have also been developed for high-dimensional expensive black-box optimization problems [81,85] or high-dimensional expensive constrained black-box optimization problems [86].

Space reduction-based optimization strategy
The space reduction-based optimization strategy is another widely used approach in practice. In this approach, a small subspace in the design space is constructed or identified. This subspace is referred to as the region of interest, where the global optimum is located with high probability. During optimization, a number of newly added samples are generated in the region to gradually improve the fitting quality of metamodels in the vicinity of the global optimum. In general, the space reductionbased optimization strategy includes three major steps: (1) determining the center of the region; (2) calculating the size of the region; and (3) trimming the region according to the design space. The region is identified, as illustrated in Fig. 10. In this paper, several representative identification methods for the ROI are introduced as follows.

Significant design space method
The significant design space method is a simple spacereduction method [94]. In this method, the region of interest is constructed around the best sample (x * k ) at the kth iteration. The space size (B k ) is determined according to the metamodel approximation accuracy in the vicinity of x * k , as shown in Eq. (22): where e k is the metamodel approximation accuracy in the vicinity of x * k ; f * k andf * k are the objective function and metamodel prediction values at x * k , respectively; ϑ k is the size scale indicator; and e a is the acceptable deviation typically in the range of 0.001-0.050. Based on Eq. (22), if the metamodel approximation accuracy is satisfied (i.e., ϑ k > 1), then B k is increased to B k−1 /α to explore the global optimum; otherwise, B k is decreased to B k−1 · α to improve the metamodel approximation accuracy and exploit the local optimum. If the length in any dimension is no less than the threshold, then the kth trial significant design space is defined as k ], as shown in Eq. (23): To prevent the significant design space from exceeding the entire design space, S 0 , the kth space, S k , is defined as the intersection between S t k and S 0 , as illustrated in Fig. 10.

Trust region method
The trust region method is a widely used space reduction method for refining metamodels, where the size of the region of interest is adjusted according to the prediction performance of the objective improvement [102], as shown in Eq. (24): where ∆f is the prediction performance of objective improvement. In Eq. (24), a larger ∆f indicates that the optimization process can lead to a better optimum. A smaller ∆f indicates that the improvement in the objective is insignificant. Note that the objective cannot be improved if t is negative. The size of the region can be determined in terms of the radius of the trust region, as shown in Eq. (25): where δ k+1 is the radius of the trust region at the (k+1)th iteration; x * k and x * k−1 are the best sample points at the kth and (k − 1)th iterations, respectively; c 1 and c 2 are the shrinking and amplification factors, respectively; and ∆ is the upper bound of the trust region radius. As proved in Ref. [102], the trust region-based optimization strategy can converge to the local optimum, which can also converge to the global optimum. In recent years, some trust region method-based adaptive optimization methods have been developed to further improve the optimization performance, as detailed in Refs. [103][104][105].

Interesting sampling region method
The interesting sampling region method is a machinelearning-assisted method for locating the potential region containing the global optimum [106]. In this method, the existing samples are classified into two categories with respect to their responses. If the objective value at the sample point is less than the predefined threshold, P thresh , the sample is labeled as a superior sample, indicating that the sample is probably located close to the global optimum. A binary machine learning classifier (e.g., support vector machine [107] and Bayes classifier [108]) is constructed based on labeled samples. Then, a larger number of inexpensive geometry points are generated via Latin hypercube design in the design space in which superior inexpensive points are identified by the trained classifier. The cluster center of superior inexpensive points is determined to depict the potential location of the global optimum. For instance, consider the six camelhump functions; the superior inexpensive points (i.e., dots labeled with red circles) under different classification thresholds (P thresh ) are illustrated in Fig. 11. The figure shows that the potential location can be depicted by the cluster of superior inexpensive points using a proper P thresh value.
Finally, the interesting sampling region at the kth iteration is defined as a hypercube subregion, as shown in Eq. (26): where the current pseudo-optimum (x (k) ISR ) is the center of the region; x * pse is the cluster center of superior inexpensive points (also regarded as the potential position of global optimum); R (k) is the Euclidean distance between x (k) ISR and x * pse ; and η is the shrinking coefficient used to adjust the size of R (k) . As the optimization proceeds, the interesting sampling region is dynamically scaled based on the Euclidean distance between x considerably improving the optimization efficiency and global convergence. Once the current pseudo-optimum (x (k) ISR ) is far from the potential global optimum (i.e., , the exploration region must be enlarged to include areas that may not have been covered, thus avoiding missing the true optimum. Additionally, if x (k) ISR is probably in the vicinity of the global optimum, then the search region can be reduced to improve the approximation accuracy of the constructed metamodels.
In addition to the aforementioned methods, many other space reduction methods have been developed. For instance, Wang et al. proposed an adaptive response surface method to solve computation-intensive design problems, where regions with inferior sample points are cut for sampling in the reduced design space and updating the metamodels [109]. In addition, some variants of the adaptive response surface method were investigated to further improve optimization convergence and robustness [110][111][112]. Dong et al. proposed a new space reduction-based optimization algorithm to solve the unconstrained expensive black-box optimization problems, where a score-based reduced subspace around the current best sample is created to accelerate the local convergence [113]. Qiu et al. proposed self-organizing maps and fuzzy clustering-based three-stage space reduction and metamodeling optimization methods to improve efficiency and robustness performance [114]. Liu et al. developed a Monte Carlo method and space reduction strategy to improve the efficiency of the sampling process [115].

Surrogate-assisted evolutionary algorithms
Motivated by the idea of merging surrogates into the evolutionary process, surrogate-assisted evolutionary algorithms have received considerable attention in recent years [116,117]. In the optimization of surrogate-assisted evolutionary algorithms, metamodels are developed as "surrogates" of expensive simulation models for stochastic evolutionary operations. This can effectively alleviate the computational complexity in engineering optimization. Several potential solutions (called individuals or particles) are also concurrently selected as newly added sample points to improve the surrogate approximation accuracy in the vicinity of the optimum. The fundamental architecture of the algorithm is graphically illustrated in Fig. 12.
The selection of newly added infill sample points from the offspring population is found to be a crucial step in the algorithm [116]. The commonly used selection criteria can be generally divided into three categories: performancebased, uncertainty-based, and hybrid criteria [118]. In the performance-based criterion, the infill sample points are selected by considering only the predicted fitness values [117,119,120]. Generally, this criterion cannot be exclusively used because optimization probably falls into the local optima once the surrogates fail to approximate the simulation model well. The uncertainty-based criterion chooses infill sample points with considerable uncertainties, effectively improving the accuracy of surrogates and leading the search to unexplored regions [121]. However, numerous simulation model evaluations are performed to explore regions with sparse samples, resulting in a low convergence rate [118]. The hybrid criterion simultaneously incorporates performance and uncertainty [119,122]. The primary objective of this criterion is to balance the surrogate approximation accuracy and population competitiveness during optimization.

Metamodel-based optimization enhanced by machine learning
Machine learning is a well-known state-of-the-art technology that has been successfully applied to big data mining, natural language processing, image recognition, medical diagnosis, and so on. Similarly, a considerable amount of research has been conducted to integrate machine learning techniques (e.g., classification learning, cluster analysis, and dimensionality reduction) into engineering optimization. Classification learning is among the most widely used machine learning techniques in optimization and has considerable potential for identifying the feasibility of a solution during the optimization process. Motivated by this concept, several studies have been performed to improve the optimization performance by employing a support vector machine or its variants for identifying superior inexpensive points [140][141][142][143][144]. Additionally, other classification learning techniques, such as decision tree [145] and k-nearest neighbor [146], are employed for optimization. To capture the potential regions effectively, classified points are further categorized based on a set of user-defined characteristics or attributes using cluster analysis techniques. For instance, cluster analysis can be combined with a support vector machine to identify a potential search region by clustering numerous superior inexpensive points, as described in Refs. [21,106].
In multi-objective optimization, cluster analysis can also be implemented to improve the spatial uniformity of the Pareto frontier when determining newly infilled sample points or reference vectors [135,136,147].
To alleviate the computational complexity caused by the "curse of dimensionality", the dimensionality reduction technique provides a promising approach for MBDO to solve high-dimensional optimization problems. One typical approach is to conduct sensitivity analysis before optimization starts. Here, primary design variables are reserved for optimization, whereas minor ones are neglected and set as fixed values during optimization [148]. Although the dimensionality of the optimization problem is reduced by sensitivity analysis, the optimality of the obtained optimized solution probably decreases owing to the irrelevance of several design variables. Another dimensionality reduction method is the utilization of manifold learning (e.g., principal component analysis and Sammon mapping) to map existing sample points together with alternative infill sample points from high-dimensional space into low-dimensional space. A low-dimensional surrogate is also trained to select infill sample points as optimization proceeds [149,150].
In modern engineering design, because new problems are generally derived from a series of previously solved tasks, the concept of transfer optimization has been recently developed. It is a newly emerged methodology for improving the optimization performance of a new optimization task by mining existing knowledge [151]. Although numerous transfer optimization-based methods have been recently investigated [152][153][154], only a few studies on MBDO methods using transfer optimization have been reported [155]. Therefore, further improving the optimization performance by integrating the transfer optimization technique into MBDO methods is promising and valuable [151].
In addition to the aforementioned data-driven machine learning techniques, physics-informed machine learning techniques, also known as physics-informed neural networks [156,157], have emerged as an alternative to illposed and inverse problems. Fundamental physical laws and domain knowledge are embedded by exploiting observational data [158,159], tailoring neural network architecture for physics constraints [160,161], and/or imposing physics constraints into the loss function [162,163]. The physics-informed neural network is expected to outperform existing machine learning methods in partially understood, uncertain, and high-dimensional problems. It is also another potential research topic in the area of metamodel-based MDO.

Expensive black-box constraint-handling mechanism
In engineering practice, MDO problems in aerospace systems generally involve various constraints, such as the natural frequencies of a structure and end-of-life power in orbit. Note that because most constraints arise from time-consuming black-box simulation models (e.g., structural FEA model), directly calculating the gradient information of the constraints via numerical methods (e.g., finite difference method) to search for the feasible domain is difficult. Thus, in engineering practice, the solution of MDO problems with numerous expensive black-box constraints remains complex. To overcome this obstacle, some constraint-handling mechanisms have been integrated with metamodel-based MDOs to further improve the optimization capacity.

Penalty function-based methods
Penalty function-based methods are convenient approaches that are most commonly used for handling expensive black-box constraints. Here, the objective function is augmented by constraints to cope with penalty factors, as given in Eq. (27): where λ is the penalty factor, and F (X) is the augmented objective function. The metamodel of the augmented objective function is established for optimization, and the infeasible sample point is penalized with respect to the penalty factor scale. Although penalty function methods are conveniently implemented in practice, determining the proper penalty factor is difficult. If the penalty factor is extremely small, the optimization process is virtually unable to converge to the feasible domain. If the penalty factor is extremely large, the penalty part can overlap the objective, rendering convergence difficult. Nevertheless, the penalty function method remains widely employed in MBDO methods. For instance, a predefined penalty function is employed to handle the expensive constraints [164]. This method is further enhanced in Ref. [110], where the penalty factor is automatically adjusted according to the constraint violation value.

Filter-based methods
In view of the difficult implementation of penalty function-based methods, penalty-free methods have become popular in recent years; among these latter methods, filter-based methods are the most representative. The concept of a filter was originally proposed by Fletcher [165] and was successfully applied to many penalty-free numerical optimization methods [166][167][168]. To illustrate the concept of a filter, the constraint violation function of the MDO problem is defined as follows: h(X) = max{g 1 (X), g 2 (X), · · · , g m (X)} (28) where h(X) denotes the constraint violation function. A positive h(X) represents an infeasible sample point, and a larger h(X) generally indicates inadequate infeasibility. Based on the definition of the Pareto non-domination set, the concept of the filter is clarified as follows: According to the above definitions, the construction of a filter basically means the creation of a Pareto front in terms of f (X) and h(X). A filter in the f (X)-h(X) space is illustrated in Fig. 13 [21], where the dots represent the non-dominated sample points. If a newly added sample point is located in the shaded area, then it is not dominated by existing points in the filter, that is, the filter is augmented. If it is located in the hatched area, then the newly added point can dominate at least one point in the filter (i.e., the filter is refined); otherwise, the point is rejected by the filter. In fact, the filter generates a set of blocks for newly added sample points, as indicated by the solid lines in Fig. 13. In constrained optimization procedures, the filter functions as a classifier for accepting or rejecting a sample point, considering both optimality and feasibility. Note that a sample point can be accepted by the filter only if it improves either the optimality or feasibility; hence, the feasibility and optimality of existing sample points are gradually enhanced during the metamodel-based optimization process.  A number of MBDO methods have been developed to solve expensive constrained black-box problems based on the filter concept. For instance, the filter technique is combined with a support vector machine to identify a potential sampling sub-region in the vicinity of the global feasible optimum [21]. In Ref. [72], the probability of constrained improvement is enhanced; only the sample point with a positive probability of constrained improvement can be accepted by the filter to enhance optimality and feasibility.
In addition to the aforementioned constraint-handling mechanisms, certain customized methods have also been developed to handle constrained optimization. In Refs. [16,169,170], all constraints are approximated by metamodels to avoid the complicated determination of the penalty factor. Moreover, some sequential updating techniques considering the confidence intervals of Kriging models for constraint approximation were developed [171,172]. To further solve problems with numerous constraints, the constraint aggregation method, which lumps numerous constraints into one or a few constraints to significantly reduce the computational cost, has drawn considerable interest in recent years. The Kreisselmeier-Steinhauser function [173] is a well-known constraint aggregation method that has been used in various applications [86,174,175]. Moreover, based on Refs. [176,177], Kreisselmeier-Steinhauser function-based MBDO methods are efficient in solving expensive black-box constraint problems.
6 Metamodel-based aerospace system MDO examples

All-electric geostationary Earth orbit satellite MDO problem
The all-electric geostationary Earth orbit satellite MDO problem is referred from Ref. [19]. The studied allelectric geostationary Earth orbit satellite is a cuboid with a payload module, service module, and solar array. Four Hall-effect thrusters are installed at the bottom of the satellite to execute attitude control, geosynchronous transfer, and geostationary Earth orbit station-keeping maneuvers, as shown in Fig. 14 [19]. The all-electric geostationary Earth orbit satellite MDO problem involves geosynchronous transfer, geostationary keeping, power, thermal control, attitude control, and structure disciplines. The design structure matrix of the MDO problem is shown in Fig. 15 [19], and the involved analysis models of disciplines are detailed in Ref. [19]. The satellite MDO problem was formulated using Eq. (29) [19]. The design variables are steer angles (α, β, φ) during low-thrust geosynchronous transfer, positions of electric thrusters (d T , d N ), battery capacity (C s ), area of solar arrays (A sa ), area of thermal radiator (A r ), angular momentum of reaction wheel (H w ), and thickness of composite material structure (SH, CH, TBH, SP, CP, TBP). These are optimized to minimize the total mass of the satellite (M satellite ) subject to the constraints of total transfer time (t f ), stationkeeping accuracy (λ max and i max ), available power (P BOL In this example, the filter-based sequential radial basis function method from Ref. [21] was used to solve the MDO problem. The optimization results are summarized in Table 3 [21]. Compared with the conventional genetic algorithm-based multidisciplinary feasible method, the computational cost of the filter-based sequential radial basis function method is reduced by 50%.  Fig. 15 Design structure matrix of all-electric satellite (reproduced with permission from Ref. [19], © IAA 2017).

Earth-observation MDO problem
The Earth observation satellite MDO problem is referred from Ref. [178]. The satellite consists of payload and service cabins, as shown in Fig. 16 [178]. The payloads include a 10-band radiometer and charge-coupled device camera, which are fixed at the bottom of the payload cabin. The solar arrays are set along the south and north faces of the satellite to reduce the influence of external heat flux. Other facilities, such as battery and attitude control subsystems, are placed in the service cabin. The Earth observation satellite MDO problem involves five disciplines: orbit, payload, structure, power, and mass. The design structure matrix of this problem is shown in Fig. 17 [178]. The disciplinary modeling approaches are presented in Ref. [178].
The Earth observation satellite MDO problem also aims to minimize the total mass of the satellite, as formulated in Eq. (30) [178]. The design variables include the aperture and focal length of payloads, area of solar arrays, capacity of storage battery, and thickness of composite material structure. The constraints include the signal-tonoise ratio and resolution of payloads, noise-equivalent temperature difference, surplus power, discharge depth, and natural frequencies.
In this example, the sequential radial basis function using a support vector machine from Ref. [106] was used to solve the MDO problem. The optimization results are listed in Table 4 [106]. The computational cost of the sequential radial basis function using the support vector machine is only found to be 2.25% and 51.24% of the computational costs of genetic algorithm and sequential quadratic programming, respectively.

Satellite constellation MDO problem
The satellite constellation MDO problem is obtained from Ref. [20], where a small satellite constellation is established to achieve a cooperative Earth observation mission.
The design structure matrix of the satellite constellation MDO problem is shown in Fig. 18 [20]. The inter-coupled relationships between the constellation configuration and design of satellite subsystems are considered to enhance system performance. In this example, the constellation is based on the Walker-δ constellation. In this constellation, the ascending nodes of the orbit planes are uniformly distributed around the equator plane, and the satellites are uniformly distributed within the orbital planes at the same inclination [20].
min M system = (m payload + m power + m thermal The constellation MDO problem was formulated using Eq. (31). This is a continuous-discrete mixed optimization problem. The continuous design variables (X c ) for constellation configuration include the orbital altitude (h), inclination (i), and right ascension of the ascending node (Ω); those for satellite subsystems include the optical parameters of the payload, power subsystem parameters, and thickness of structural plates. The discrete design variables (X d ) include the number of orbit planes (P ) and the number of satellites in each orbit plane (S). These design variables are optimized to minimize the total mass of the entire constellation system, subject to the constraints of coverage ratio, payload resolution, signal-to-noise ratio of payload, depth of discharge, power surplus, maximum temperature, and natural frequencies.
In this example, the sequential radial basis function method using a support vector machine for discretecontinuous mixed variables is used to solve the MDO problem. In the optimization method, a discrete-continuous variable sampling method is utilized to handle the discrete variables. The optimization results, summarized in Table 5 [20], are compared with those from the integer coding-based genetic algorithm. This indicates that the sequential radial basis function method considerably decreases the system mass by approximately 28.63%. Moreover, the computational budget of this method com- pared with the integer coding-based genetic algorithm is reduced by more than 85%. The optimized configurations of the constellations are shown in Fig. 19 [20].
In addition to the aforementioned metamodel-based aerospace system MDO examples, metamodeling techniques have also been widely used in the aerospace field. For instance, Xiong et al. proposed an intelligent optimization strategy based on statistical machine learning for spacecraft thermal design, where a neural network metamodel was used to approximate the spacecraftthermophysical model [179]. Smith et al. established a charged drag coefficient metamodel for spacecraft deorbit design with ionospheric drag from a low Earth orbit [180]. Wu et al. developed a double-layer radial basis function metamodel-based optimization method for the liquidgas interface determination of on-orbit refueling [181]. Peng and Wang proposed a dynamic metamodel-based optimization method for spacecraft formation reconfiguration on liberation point orbits to achieve fast path planning [182]. Feldhacker et al. investigated a samplingbased least-squares regression method to optimize the required velocity increments for rendezvous in spacecraft three-body dynamical systems [183]. Peng et al. proposed an adaptive metamodel-based optimization method called the sequential radial basis function for satellite truss structure optimization [184]. Gogu et al. proposed a response surface approximation method to aid in the design of integrated thermal protection systems for spacecraft reentry [185]. From the literature, MBDO methods are reported as effective and promising for solving complex aerospace system optimization problems. Simpson et al. applied Kriging models to an aerospike nozzle MDO problem for global approximation [186].

Conclusions
In this paper, an overview of the metamodel-based MDO and its application to aerospace systems is presented. The formulation of the MDO problem for aerospace systems and the overall procedure of the metamodel-based optimization process are introduced. Moreover, the technical architecture of metamodel-based MDO is summarized. Metamodeling techniques, including design of experiments, metamodels, and multi-model fusion methods, are surveyed with the aim of constructing a reducedorder numerical approximation model to inexpensively predict system responses during optimization. A number of state-of-the-art metamodel-based multidisciplinary op-timization strategies are discussed in detail. Generally, metamodel-based optimization strategies can be classified into two categories: infill criterion-based and space reduction-based optimization strategies. Recently, novel techniques, such as evolution computation and machine learning, have also been integrated with metamodels to further improve optimization performance. Finally, some penalty-based and penalty-free constraint-handling mechanisms are discussed to illustrate how to deal with expensive black-box constraints in the metamodel-based optimization process. Additionally, several real-world aerospace system MDO examples, which can aid researchers in realizing the applications of metamodel-based MDO in engineering practice, are presented. Note that although the contents of this paper are focused on the MDO for aerospace systems, the fundamental methodology and technology can also be applied to the design of other engineering systems, such as ships, aircraft, and automobiles, without significant modifications.
In future work, the optimization capacity of metamodel-based MDO must be further improved. For instance, due to the "curse of dimensionality", highdimensional expensive black-box problems remain complex for metamodel-based optimization. In addition, conventional metamodel-based MDOs generally consider multidisciplinary models as black-box functions. By exploiting prior knowledge (e.g., designer's experience, explicit model expression, and physical laws), the black-box MDO problem can be transformed into a gray-box or even a white-box optimization problem to further reduce cost and improve convergence. Moreover, the combination of metamodeling, MDO, and artificial intelligence has also attracted considerable interest. Thus, with the development of advanced modeling, analysis, and computation technologies, the research on metamodel-based MDO is anticipated to remain attractive. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.