1 Introduction

As a key aeroengine component, turbomachinery efficiency is of great significance to whole aeroengine performance. Aerodynamic design in a low-dimensional design space, particularly 1D velocity triangles, remains the key step in the overall turbomachinery design process [1]. This is due to the substantial amount of design work and the extended design cycle required to design and optimize turbine performance from a high-dimensional design space. However, 1D aerodynamic design requires a high level of design experience and knowledge of physics [2]. Therefore, learning from previous designs using machine learning algorithms is critical for reducing reliance on human experience and improving design efficiency.

With the development of aerodynamic design method, combining optimization algorithms with 1D mean-line aerodynamic performance evaluation methods or loss models have become an important path toward improving turbine efficiency [3,4,5,6]. Qin [7] used the genetic algorithm (GA) to optimize the first stage of a steam turbine by optimizing the geometric and aerodynamic parameters at the mean radius. The turbine stage's efficiency was improved by more than 0.4%. Coull [5] incorporated the influence factors of the blade lift coefficient, flow coefficient, stage loading, and Reynolds number into the 1D aerodynamic performance evaluation and suggested efficiency trends with these factors. Bertini [8] modified the Smith diagram to reflect modern low-pressure turbine design level. By analyzing the differences between real low-pressure turbines and numerical models, these differences were accounted for in the 1D design, thus improving 1D design robustness. Agromayor [9] developed a 1D aerodynamic empirical loss model that accounts for diffuser performance and combined it with the sequential quadratic programming method to optimize a 1D turbine design. The difference in mass flow rate and output power between the model prediction and experiment was less than 1.2%. Juangphanich [10] combined the 1D design directly with the three-dimensional design to reduce the design cycle, and utilized the differential evolution algorithm for multi-objective optimization of load and stage efficiency in the 1D design.

However, these optimization-based methods have some limitations. First, the iterative "design–evaluation–optimization" process must be repeated for each new design requirement until it is are satisfied [11]. This is due to they do not take into account prior design knowledge, in contrast to human experts, who can identify a near-optimal solution based on prior design experience. Second, the method accuracy is limited by the chosen loss model. Each aeroengine design generation needs to accumulate a large quantity of data and establish new loss models to expand the design space and improve model accuracy. This is evidenced by the historical continuous development of loss models, such as the early developed Kacker–Okapuu loss model [4] considering the influence of fluid compressibility and shock loss and the loss model suitable for high lift turbines proposed by Coull and Hodson [5]. If existing high-precision experimental or numerical databases can be used to update loss models in real time, 1D design's accuracy will then be improved, and its design space will be expanded.

Machine learning [12,13,14] and Bayesian optimization [15,16,17] developments have the potential to address the above problems. Machine learning-based surrogate models can be utilized to alleviate the computational burden of complex systems that require high calculation costs [18]. Meng demonstrated the effectiveness of using an adaptive surrogate model [19] and uncertainty-based strategy [20] for multidisciplinary design optimization problems. Moreover, transfer learning [12, 14] is a machine learning technique that can extract prior design knowledge from one or more tasks and apply the learned knowledge to the target task. Hence, transfer learning has the potential to address the aforementioned limitations, because it may be utilized to accelerate the design optimization process, improve model accuracy and expand design space. Turbomachinery aerodynamic design is a field well suited for transfer learning in two aspects: first, high-precision samples are limited due to the high cost of computational fluid dynamics (CFD) and experiments; second, design tasks for the same type of turbomachinery are very similar. Design tasks with varying design requirements is typically iteratively optimized independently. In fact, relevant knowledge of the design variables with respect to the target aerodynamic performance is acquired during the optimization process. However, except for optimal solutions, such prior knowledge is typically abandoned once the design task is completed. As a result, if prior knowledge from completed design tasks can be used as a transfer source, transfer learning will significantly improve the optimization efficiency of the target design task.

In this paper, a transfer optimization learning method based on transfer learning and Bayesian optimization (BO) is proposed and applied to the 1D aerodynamic design of single-stage turbines. The overall method includes a three-stage modeling process. (1) The design tasks are divided into completed source tasks and the target design task. A Gaussian process-based surrogate model between the design variables and the aerodynamic performance is established for each design task. (2) These surrogate models are probability weighted to build an ensemble model, and the ensemble model is where the knowledge transfer occurs. (3) The ensemble model is adopted as the surrogate model to accelerate BO on the target task. Then, this method is used to optimize a single-stage turbine design problem, and its results are compared with those of other state-of-the-art optimization methods.

This paper is organized as follows. Section 2 illustrates the 1D turbomachinery design problem. Section 3 describes the transfer optimization learning framework. Section 4 compares the results obtained by the proposed method with those of the traditional optimization methods, and discusses the impact of various learning strategies on the learning of the target design task. Then, the results are validated by high-fidelity full three-dimensional (3D) numerical simulation. Finally, Sect. 5 concludes the work and introduces future work.

2 Design problem formulation

In this section, the 1D turbine design problem is described in detail. The 1D design variables, loss model and constraints for the design variables are first introduced, and the overall design optimization problem is then formulated.

2.1 Design variables and loss model

As shown in Fig. 1, in the single-stage turbine design, once the overall turbine design requirements are specified, the corresponding 1D design parameters, including the mean-line velocity triangle and meridian channel path, are usually firstly determined. A single-stage turbine's velocity triangle is determined by five dimensionless parameters: the loading coefficient (\(\mu\)), the flow coefficient (\(\varphi\)), the degree of reaction (\(\Omega\)), the axial velocity ratio (\({K}_{a}\)), and the circumferential velocity ratio (\({D}_{2m}\)). Here, \({K}_{a}\) represents the ratio of the axial velocity at the inlet and outlet of the rotor blade, and \({D}_{2m}\) represents the ratio of the radius at the outlet and inlet of the rotor blade. Furthermore, because the aspect ratio can impact turbine efficiency and the meridional channel, the stator aspect ratio (\(h{b}_{1}\)) and rotor aspect ratio (\(h{b}_{2}\)) are also chosen as design variables. Therefore, the 1D aerodynamic design parameters of a single-stage turbine are determined by the following 7 design variables

$$x = \left[ {\mu ,\varphi ,\Omega ,K_{a} ,D_{2m} ,hb_{1} ,hb_{2} } \right].$$
(1)
Fig. 1
figure 1

Overview of the turbine 1D aerodynamic design

In the 1D design, turbine efficiency with respect to the 7 design variables can be evaluated with empirical correlations or loss models, and the optimal design can be obtained through optimization. The loss model used in this paper is the one described in Ref. [21], which has been validated by many research institutes, and it can preliminarily evaluate the friction loss, trailing edge loss, secondary flow loss, tip clearance loss, and incidence loss. The loss model can be used as a merit function for optimization algorithms to assess the quality of the single-stage turbine's 1D design parameters. Since the specific heat of the working fluid is related to the temperature and gas composition, the variable heat capacity method is used in this paper. The specific heat capacity at constant pressure is treated as a function of temperature and gas composition, and the turbine 1D aerodynamic calculation the is performed by using the average specific heat at constant pressure over the inlet and outlet of the turbine stage. For a gas with a known composition, a function of the specific heat at constant pressure and temperature of the working fluid is fitted into polynomials using the least square method. Then, through the turbine work in the form of total enthalpy drop, the static temperature at the outlet of the turbine stage is solved iteratively, and then the average specific heat capacity at constant pressure is obtained.

It should be noted that the merit function of the proposed methods is not limited to the chosen loss model, and higher-fidelity data from experiments or CFD calculations can also be used to further improve the proposed method's accuracy.

2.2 Design constraints

There are some constraints on the aerodynamics and meridional channel of the 1D aerodynamic design parameters. The constraints on the aerodynamics are described first. In general, they consist of three aspects. (1) The reaction at the blade root cannot be negative. (2) The axial inlet flow angle of the blade cascade must be greater than the outlet flow angle to ensure cascade channel convergence. (3) The absolute flow angle and absolute Mach number at the turbine final stage outlet must be within a certain range.

The constraints on the meridional channel are then described. The meridional channel geometry of each turbine stage can be determined by the velocity triangle, aspect ratios of stator and rotor, and the given axial clearance. As shown in Fig. 2, a typical turbine meridian channel is represented by four diffusion angles: the diffusion angle at the stator shroud (\(\theta_{ss}\)), the stator hub (\(\theta_{hs}\)), the rotor shroud (\(\theta_{sr}\)) and the rotor hub (\(\theta_{hr}\)). The constraints on the meridian channel include the maximum outer diameter and the diffusion angle limits at the hub and shroud. The ranges of these diffusion angles are constrained to prevent boundary layer separation due to excessive meridian channel diffusion angles. At the same time, it is also necessary to ensure the continuous development of the hub and shroud profile of the meridian channel and ensure that its first derivative is smooth.

Fig. 2
figure 2

Meridian channel of a single-stage turbine

2.3 Optimization problem formulation

According to the above analysis, the 1D turbine design optimization is a constrained optimization problem. According to design requirements, all equality constraints (such as mass flow rate and expansion ratio) and inequality constraints (e.g., maximum outer diameter limit, meridian channel diffusion angle, outlet Mach number, degree of reaction at blade root) can be converted to the following penalty function

$${\text{P}}\left( x \right) = - \mathop \sum \limits_{i = 1}^{m} (\frac{{\left| {\varphi_{i} \left( x \right)} \right| - \varphi_{i} \left( x \right)}}{2})^{2} - \mathop \sum \limits_{j = 1}^{l} \kappa_{j}^{2} \left( x \right),$$
(2)

where the first item represents inequality constraints and the second item represents equality constraints. Therefore, the initial 1D constrained optimization problem for maximizing efficiency \(\eta\) can be convert to solve an unconstrained optimization problem with the following decision function

$$f\left( x \right) = \eta \left( x \right) + rP\left( x \right),$$
(3)

where \(\eta (x)\) can be computed by the loss model, and \(r\) is the weight of the penalty function \(P(x)\).

3 Transfer optimization learning framework

Many global optimization-based methods have been used to solve the problem described in Sect. 2.3, such as GA [22] and PSO [23]. However, except for optimal solutions, the samples that contain design knowledge generated from optimization are typically discarded. When a new design requirement is proposed, the optimization must start from scratch. In this paper, we attempt to use the samples generated from previous optimizations of various design requirements as a transfer source to help improve the optimization efficiency of the target design problem. In the following paper, previous design problems and the target design problem are referred to as source tasks and the target task, respectively. This method is termed “transfer optimization learning” in this paper.

In this section, the background and overall transfer optimization learning framework are first introduced. Then, the DoE module, ensemble model, and transfer optimization module—the framework's three modules—are described. Finally, verification is performed on a simple optimization problem to demonstrate the effectiveness of the proposed method.

3.1 Overall framework

Transfer learning is an emerging research field that uses a variety of approaches to solve new problems faster or more effectively by leveraging knowledge gained from solving similar problems [24]. According to the universal transfer theory proposed by Judd, learning to transfer is a generalization of experience. If two tasks are related, it is possible to transfer knowledge from one task to another.

Figure 3 shows the basic concept of transfer learning. Assume a source domain \(D_{S}\) and source task \(T_{S}\) with decision function \(f_{S}\), a target domain \(D_{T}\) and target task \(T_{T}\) with decision function \(f_{{\text{T}}}\), where \(D_{S} \ne D_{T}\) or \(f_{S} \ne f_{{\text{T}}}\). The goal of transfer learning is to improve learning the target decision function \(f_{{\text{T}}}\) using the knowledge in \(D_{S}\) and \(T_{S}\). The data from source tasks are used for training and testing on the target task; note that the data distributions and domains might be different in source tasks and target task.

Fig. 3
figure 3

Transfer learning process

To illustrate the above concept, a typical case is given. Consider the following problem, given \(N\) tasks, and each task \(i\) consists of design variables \({\mathbf{x}}_{i}\) and a decision function \(f_{i}\). The goal of each task \(i\) is to find \({\mathbf{x}}_{{\text{i}}}^{*}\) that maximizes \(f_{i}\), and their decision functions are all quadratic functions as follows

$$\begin{array}{*{20}l} R^{3} :f_{i} \left( {\mathbf{x}} \right) = \frac{1}{2}a_{2,i} |\left| {{\mathbf{x}}_{t} } \right||_{2}^{2} + a_{1,i} 1^{{\text{T}}} {\mathbf{x}}_{i} + 3a_{0,i} ,& \quad i \in \left[ {1,N} \right] \hfill \\ \left( {a_{2,i} ,a_{1,i} ,a_{0,i} } \right) \in [ 0.1,10]^{3} ,& \quad {\mathbf{x}} \in [ - 5,5]^{3} \hfill \\ \end{array} ,$$
(4)

where \(\left( {a_{2,i} ,a_{1,i} ,a_{0,i} } \right)\) represents background information for each task \(t\), and \({\mathbf{x}}\) represents three-dimensional design variables. It should be noted that in practice, decision functions usually have no analytic formula.\(\left( {a_{2,i} ,a_{1,i} ,a_{0,i} } \right)\) is randomly chosen to generate \(N = 30\) different tasks. Note that the decision function is different for each task due to different \(\left( {a_{2,i} ,a_{1,i} ,a_{0,i} } \right)\). The top 29 tasks will be solved at first and viewed as source tasks, and then the last task to be solved is viewed as the target task. The problem is how to transfer the knowledge obtained from source task optimizations to improve optimization on the target task.

To solve the foregoing problem, a transfer optimization learning framework is proposed in this paper. Figure 4 depicts the overall framework, which is made up of three modules. Module 1 is the DoE module constructed to initialize the training samples. In the source and target task design spaces, this module generates design variable samples and evaluates the values of its decision function. Module 2 is an ensemble model module which consists of two parts. (1) The data from the DoE is used to establish the surrogate models for source tasks and the target task. (2) A linear weighted combination of these surrogate modules forms an “ensemble model”. Module 3 performs transfer optimization on the ensemble model to obtain the optimal solution of the target task. The construction methods of each module are detailed in the following sections.

Fig. 4
figure 4

Overview of the transfer optimization learning framework

3.2 DoE module

In transfer learning, the better the design information contained in source samples is, the more favorable it is for target task learning. As a result, the DoE module should generate enough source samples for target tasks. Here, each sample consists of the design variables \(x\) and the decision function value \(f(x)\).

Latin hypercube sampling (LHS) [25] and adaptive sampling via Bayesian optimization (ASBO) [26] are commonly used DoE methods that can generate samples covering a wide design space. In addition, in transfer optimization learning, the target task can utilize the prior knowledge from source tasks, so its design space is not completely blind. Thus, samples generated by LHS or ASBO can be further filtered by decision function value to obtain samples concentrated in optimal regions. Based on the above analysis, the following four DoE methods are considered.

  1. (1)

    LHS;

  2. (2)

    ASBO;

  3. (3)

    LHS combined with selecting samples concentrated in optimal regions (LHS-Opt);

  4. (4)

    ASBO combined with selecting samples concentrated in optimal regions (ASBO-Opt).

The optimization steps are predefined in ASBO to ensure wider coverage of the design space. Because the initial samples in the LHS and ASBO might affect the optimal solution, this sampling process can be repeated several times with different random seeds to account for randomness. Details of the experimental setting of the four DoE methods are described in Sect. 4.1, and their impact on the optimal solution are discussed in Sect. 4.3.

It should be noted that the samples can also be obtained from previous experiments or numerical simulations without the additional cost of generating source task samples.

3.3 Ensemble model module

As shown in Fig. 4, the ensemble model combines all surrogate models of source and target tasks, and is required for subsequent optimization of the target task. An overview of the ensemble model is shown in Fig. 5.

Fig. 5
figure 5

Ensemble model overview

Assume the source task samples are generated using the foregoing DoE method and the target task samples are also obtained (which is described in Sect. 3.4); then, surrogate models between the design variables and decision functions can be established for each task. In the following paper, the surrogate model for a source task is referred to as the "base model", and the surrogate model for the target task is referred to as the "target model".

3.3.1 Construction of base models and the target model

The Gaussian process (GP) [27] is chosen as the surrogate model for source tasks and the target task. GP is probabilistic, data-efficient and flexible for carrying out function regression, especially on small datasets. Moreover, GPs will enable the construction of the ensemble model to perform data-efficient black-box BO, which is described in Sect. 3.4. Assume the observations of one task are \(O = \{ \left( {x_{j} ,y_{j} } \right)\}_{j = 1}^{n}\), where \(x_{j}\) and \({\text{y}}_{j}\) are the inputs and corresponding outputs of the decision function, respectively. The Gaussian process model can then be expressed as

$$f\left( x \right)\sim GP\left( {0,k\left( {x,x{^{\prime}}} \right)} \right),$$
(6)

where \(k\left( { \cdot , \cdot } \right)\) can be any kernel function, and the Matérn 5/2 kernel [17] is used in this paper. By maximizing the likelihood function through the L-BFGS-B algorithm [12], the hyperparameters of the GP model can be trained and learned. Once trained, the model can be used to predict the function mean value \(\mu \left( {x_{new} } \right)\) and variance \(\sigma^{2} \left( {x_{new} } \right)\) at the new sample point \(x_{new}\). The posterior distribution \(f(x_{new} |O)\) can be obtained by conditioning the joint probability distribution \(f\) on known observations \(O\), with its mean value \(\mu \left( {x_{new} } \right)\) and variance \(\sigma^{2} \left( {x_{new} } \right)\) expressed as

$$\mu \left( {x_{new} } \right) = k_{new}^{T} (K + \sigma_{n}^{2} I)^{ - 1} y$$
(7)
$$\sigma^{2} \left( {x_{new} } \right) = k\left( {x_{new} ,x_{new} } \right) - k_{new}^{T} (K + \sigma_{n}^{2} I)^{ - 1} k_{new} ,$$
(8)

where \(k_{new}\) is the covariance matrix between the new sample point \(x_{new}\) and the inputs of all observations, \(\sigma_{n}^{2}\) is the noise variance, \(k\left( { \cdot , \cdot } \right)\) is the kernel function, and \(K\) is a \(n \times n\) covariance matrix in which each element is \(K_{i,j} = k\left( {x_{i} ,x_{j} } \right)\).

Figure 6 shows the high-precision surrogate model established by GP regression for the following function:

$$\begin{array}{*{20}l} {y = {\text{sin}}\left( {2\pi x} \right) + \varepsilon } \hfill \\ {\varepsilon \sim {\text{ N}}\left( {0,0.04} \right)} \hfill \\ \end{array} .$$
(9)
Fig. 6
figure 6

GP model on a simple function

The blue interval in Fig. 6 is the confidence interval (standard deviation), which represents the confidence of the model's prediction over the decision function. This demonstrates that the GP model fits noisy samples well.

3.3.2 Construction of the ensemble model

Knowledge transfer from the source tasks to the target task occurs through the ensemble model. The ensemble model is a linear combination of base models and the target model. Assuming that there are \(N\) tasks, including \(N - 1\) source tasks and one target task,\(x\) is the design variable, and the ensemble model \(\overline{f}\) is still a GP with the following form:

$$\overline{f}(x|D)\sim {\text{GP}}\left( {\mathop \sum \limits_{i = 1}^{N} w_{i} \mu_{i} \left( x \right),\mathop \sum \limits_{i = 1}^{N} w_{i}^{2} \sigma_{i}^{2} \left( x \right)} \right)$$
(10)

where \(w_{i}\),\(\mu_{i} \left( x \right)\) and \(\sigma_{i}^{2} \left( x \right)\) are the weight, mean value and variance of the ith base model \(f^{i}\), respectively. This is a powerful formal formalism. First, this makes standard GP-based Bayesian optimization tools still available, such as closed-form EI estimation [12]. Furthermore, once the basic model for each source task has been trained, it will remain unchanged throughout the optimization and can be directly loaded from previous training without additional training cost.

The weight \(w_{i}\) of the base model \(f^{i}\) can be computed by the loss function that measures the degree to which each model can correctly rank the target function observations [12]. Given \(n_{t} > 1\) target function evaluations, the loss of \(f^{i}\) is defined as the number of misranked pairs:

$${\mathcal{L}}\left( {f^{i} ,{\mathcal{D}}} \right) = \mathop {\mathop \sum \limits_{k = 1} }\limits^{{n_{t} }} \mathop {\mathop \sum \limits_{l = 1} }\limits^{{n_{t} }} 1\left( {\left( {f^{i} \left( {{\mathbf{x}}_{k}^{t} } \right) < f^{i} \left( {{\mathbf{x}}_{l}^{t} } \right)} \right) \oplus \left( {y_{k}^{t} < y_{l}^{t} } \right)} \right)$$
(11)

where expression \(1\left( {\left( {f^{i} \left( {{\mathbf{x}}_{k}^{t} } \right) < f^{i} \left( {{\mathbf{x}}_{l}^{t} } \right)} \right) \oplus \left( {y_{k}^{t} < y_{l}^{t} } \right)} \right)\) denotes that it will be 1 if the ranking on the target task observations \(y^{t}\) is the same on base model \(f^{i}\); otherwise, it will be 0. Here, the ranking loss is preferred over the squared error and log-likelihood loss because the relative location of an optimum rather than the actual value of the prediction matters more during optimization. Then, \(w_{i}\) can be derived by the probability that it is the model with the lowest ranking loss within all base models. This probability can be estimated by the generalization error of drawing bootstrap samples from the model predictions on its validation set \({\mathcal{D}}\). Assume S such samples are drawn: \(\ell_{i,s} \sim {\mathcal{L}}\left( {f^{i} ,{\mathcal{D}}_{s}^{{{\text{bootstrap}}}} } \right)\) for \(s = 1,...,S\) and \(i = 1,...,N\), then \(w_{i}\) can be computed as:

$$w_{i} = \frac{1}{S}\mathop {\mathop \sum \limits_{S} }\limits^{s = 1} \left( {\frac{{{\mathbb{I}}\left( {i \in {\text{arg}}\mathop {{\text{min}}}\limits_{i^{\prime}} l_{i^{\prime},s} } \right)}}{{\mathop {\mathop \sum \limits_{t} }\limits^{j = 1} {\mathbb{I}}\left( {j \in {\text{arg}}\mathop {{\text{min}}}\limits_{i^{\prime}} l_{i^{\prime},s} } \right)}}} \right)$$
(12)

After obtaining the weights of base models, the ensemble model can then be derived using Eq. (10). A significant challenge for the weighting-based ensemble model is that as the number of basic models increases, even those basic models with poor generalization performance can obtain nonzero weights with a certain probability. Such models degrade the performance on the target task, which is also known as the “weight dilution” problem [12]. To prevent weight dilution, a base model filtering step can be added to drop base models that have a low probability to improve over the target model. Base model \(f^{i}\),\(i \in \left( {1,...,N - 1} \right)\) is discarded from the ensemble model in each optimization step with the following three different filtering strategies [12].

  1. (1)

    Dropping base model \(i\) when its weight \({w}_{i}\) reaches a certain threshold, which is referred to as the "drop" strategy.

  2. (2)

    The base model \(i\) is discarded in each iteration with a certain probability \(p\left(i\right)\)

    $$p\left( i \right) = 1 - \frac{{\mathop \sum \nolimits_{s = 1}^{S} 1(l_{i,s} < l_{t,s} )}}{S + \alpha S},$$
    (13)

    where the factor \(\frac{{\mathop \sum \nolimits_{s = 1}^{S} 1(l_{i,s} < l_{t,s} )}}{S + \alpha S}\) is the regularized probability that the base model \(f^{i}\) outperforms the target model, S is the sample size in Eq. (12), \(\alpha\) is a constant which ensures \(p\left( i \right) > 0\). This is referred to as the "probabilistic" strategy.

  3. (3)

    As the number of iteration steps increases, the discarded probability \(p\left( i \right)\) of \(w_{i}\) gradually increases which is expressed as

    $$p\left( i \right) = 1 - \left[ {\left( {1 - \frac{{n_{t} }}{H}} \right)\left( {\frac{{\mathop \sum \nolimits_{s = 1}^{S} 1(l_{i,s} < l_{t,s} )}}{S + \alpha S}} \right)} \right],$$
    (14)

    where the factor \(\left( {1 - \frac{{n_{t} }}{H}} \right)\) reduces the probability of keeping the base model \(f^{i}\) linearly over iterations. This is referred to as the "probabilistic-ld" strategy.

3.4 Transfer optimization module

To obtain the global optimal solution for the target task, this section performs optimization using the ensemble model established in Sect. 3.3. For the target design problem, assume the design variables \(x\) are the input variables, \(f(x)\) is the decision function, and the design goal is to find \({x}_{*}=\underset{x\in \chi }{\mathrm{argmax}}f(x)\), where \(\chi\) is the design space for \(x\). Based on the assumption that the source functions and the target function are partially similar, the optimal target function solution can be searched faster and further improved through the ensemble model. This process is also the key to transfer learning, which can use the prior design knowledge from these base models to accelerate optimization. The optimal solution here refers to the optimal design result obtained under a certain calculation cost.

In this paper, a weighting-based transfer learning Bayesian optimization framework (WTLBO) is proposed to improve the solution of the foregoing optimization problem. Figure 7 compares the WTLBO with the traditional surrogate model-based optimization (SMO) learning process. Both need to iteratively update the target model after a new optimized sample is added from evaluations, but the difference lies in the surrogate model used for optimization. The SMO must construct a new surrogate model and learn from scratch for each new task. However, in the WTLBO, for a new target task, the base models in the ensemble model can affect the new sample point solution. Thus, past experience can be reused through the ensemble model to accelerate the learning process and improve the target decision function performance. The model's design experience will grow as historical design databases are continuously accumulated. One potential limitation of the method is that source samples from previous design tasks must be accumulated, and if there are no similar source tasks with similar design requirements, the improvement will be limited.

Fig. 7
figure 7

Flowchart comparison of WTLBO and SMO

Assuming that \(M\) is a surrogate model for \(f(x)\) and that \(C\) represents evaluated samples observations, the optimization loop in both SMO and WTLBO typically iterates the following steps to obtain the optimal solution.

  1. (1)

    Select a candidate sampling point \({x}_{new}\in \chi\) that is optimized over the acquisition function based on \(M\) and C.

  2. (2)

    Obtain the decision function value \({y}_{new}\) of \({x}_{new}\).

  3. (3)

    Update the sample observations \(C=C\bigcup \{({x}_{new},{y}_{new})\}\).

  4. (4)

    Update \(M\) based on \(C\), and repeat from step (1) until the maximum number of steps or convergence is reached.

In SMO and WTLBO, both need to choose the acquisition function (such as the probability of improvement, expected improvement (EI), and upper confidence bound) [17] to generate the new sampling point \({x}_{new}\). Through trial and error, the acquisition function adopted is EI, which can balance exploration (global search) against exploitation (local search) to solve the optimization effectively while avoiding falling into the local minimum.

GP is chosen as the surrogate model of SMO, so in this paper, SMO refers specifically to BO. This paper uses the SMAC3 Bayesian optimization package [28] to implement WTLBO and BO in the Python 3.6 environment.

3.5 Verification of the parametrized quadratic functions

To verify the effectiveness of the above WTLBO framework, this section performs verification on the parameterized quadratic function described in Sect. 3.1.

BO and WTLBO are used to optimize the target task, and the results are evaluated and compared. The LHS method described in Sect. 3.2 is used to generate 40 initial samples for each source task, which are utilized to perform transfer optimization learning on the target task. The "drop" strategy described in Sect. 3.3.2 is used to prevent weight dilution. When the design variables are out of range during optimization, the penalty term r in Eq. (3) is set to 1010.

The results are shown in Fig. 8. It is evident that WTLBO reaches the optimal value more quickly than BO, and that its optimal value is also significantly superior.

Fig. 8
figure 8

Results of the WTLBO and BO on the parameterized quadratic function

4 Results and discussion

In this section, the 1D turbine design method based on WTLBO is verified on a target turbine design case, and the results are compared with other state-of-the-art optimization methods. In addition, the impact of different DoE methods and strategies for weight dilution prevention on the optimization efficiency is studied.

4.1 Experimental settings

Source tasks consists of 6 single-stage 1D design problems, and their overall design requirements are listed in Table 1, which cover some typical design requirements for low-pressure turbines. Most data are drawn from Ref. [29], and others are from our lab. All tasks have seven design variables, where the aspect ratios \(h{b}_{1},h{b}_{2}\) are given according to the specific design task. The design spaces for the remaining five dimensionless velocity triangle parameters are the same, and they are within the ranges listed in Table 2. The goal for all design tasks is to maximize efficiency under their own constraints.

Table 1 Turbine source task design requirements
Table 2 Range of the design variables

As mentioned in Sect. 3.2, four DoE methods are employed to generate samples for each source task. The configurations of the four DoE methods are as follows:

  1. (1)

    LHS: 150 samples;

  2. (2)

    ASBO: 5 seed samples and 145 iteration steps to generate 150 samples in total;

  3. (3)

    LHS-Opt: 10 optimal samples filtered from the LHS in (1);

  4. (4)

    ASBO-Opt: 10 optimal samples filtered from the ASBO in (2).

The average time spent for generating the DoE and training base model of each source task using LHS and ASBO is 104.2 s and 189.6 s, respectively. Filtering the optimal samples from the samples generated by LHS or ASBO adds almost no time cost. Although additional cost is required to construct these base models, it should be noted that in practical design work, the source samples can be directly obtained from previous design tasks with almost no additional cost. In addition, once trained, the basic models can be directly loaded for future target tasks without retraining cost.

The target task is to redesign a single-stage base turbine. The mass flow rate of the base turbine is 3.5 kg/s, its expansion ratio is 3.5, the efficiency at the design point is 0.882, the total inlet temperature is 1175 K, and the total inlet pressure is 453 kPa. The rotation speed is 38,000 r/min, the inlet flow angle is 90°, the maximum outer radius does not exceed 158 mm, and the outlet flow angle should be close to the axial direction.

In the following experiment, the default DoE method is LHS, and the default weight dilution prevention strategy is the “drop” strategy. All experiments were performed on a Win10 machine with 16 Intel-i7 4.0 GHz CPU cores and 32 GB RAM.

4.2 Comparison of results from WTLBO and other optimization methods

Three global optimization methods to be compared are described first.

  1. (1)

    GA, which has been used by Qin [7] to perform 1D turbine design optimization;

  2. (2)

    PSO, which has been used by Yao [21] to perform multistage 1D turbine design optimization;

  3. (3)

    BO, which is the base model of WTLBO.

Optimizations for the 1D turbine design based on GA and PSO are implemented using pyOptSparse software [30] with default configurations. The optimizations were run five times with different seed samples to obtain the averaged result, accounting for the impact of randomness on the results.

Table 3 lists the optimal results of all models, where "Time" stands for the time taken for a method to converge. It can be seen that the optimal aerodynamic efficiency obtained by the four methods is the same. Figure 9 compares the optimal aerodynamic efficiency achieved by the four methods at different iteration steps. Stochastic optimization methods such as GA and PSO require more than 800 evaluations to obtain the optimal solution, while the BO algorithm converges to the optimal solution faster (about 150 steps used) than GA and PSO. WTLBO (in bold) can achieve the same aerodynamic efficiency in only 105 steps. Thus, the optimization steps of WTLBO are 30% less than those of BO and almost an order of magnitude less than those of GA and PSO. As a result, the proposed method can accumulate prior experience and find the optimal design variables efficiently when encountering similar design tasks.

Table 3 Results of different methods on the target design task
Fig. 9
figure 9

Optimization results on the target task during optimization

To analyze knowledge transfer from the source samples by the proposed method, Fig. 10 depicts the weights of each basic model (Eq. 10) in the ensemble model within 75 optimization steps. As the number of steps increases, the weight of the basic model corresponding to the fifth source task (BaseModel-5) also increases. When the number of iteration steps reaches 12, the weight of BaseModel-5 exceeds 70%, indicating that this model provides more information for the target task. It also implies that source samples can provide prior knowledge for learning the target task at an early stage, thereby increasing design efficiency. From a physical standpoint, this can be related to the fact that the mass flow rate and expansion ratio of the fifth source task are similar to those of the target task. After 55 optimization steps, the weight of the target model in the ensemble model approaches 100%, indicating that the target task is more convinced of the target base model and no longer needs to rely on prior design knowledge.

Fig. 10
figure 10

Basic model weights in the ensemble model

4.3 Impact of strategies for DoE and weight dilution prevention

In transfer learning, the source samples for the target task have a significant impact on the learning of the target task. To investigate the impact of the source samples on target task learning, four DoE samples (LHS, ASBO, LHS-Opt, and ASBO-Opt) are generated for study. The optimal results of the aerodynamic efficiency for the target turbine under these four configurations are compared in Fig. 11. It can be seen that the optimal efficiency is the lowest when using samples by ASBO, which is even worse than using the basic samples by LHS. This is because the samples by ASBO may contain some samples that are not suitable for the target task, which introduces many interference items into the target task learning, thus reducing the optimal efficiency. The optimal efficiency obtained using the learned samples (LHS-Opt and ASBO-Opt) is significantly higher than that of the other two DoE methods. Furthermore, the optimal efficiency of ASBO-Opt is the highest. This is because ASBO-Opt provides a reasonable initial solution for BO, which is similar to summarize design experience by the human experts.

Fig. 11
figure 11

Optimization results with four different source samples

The impact of weight dilution prevention strategies on target task learning is investigated further using samples by ASBO-Opt. The three strategies described in Sect. 3.3.2 are compared, and Fig. 12 shows the corresponding optimization results. It can be seen that the "drop" strategy outperforms the other two strategies. This is due to the earlier discarding of inappropriate base models, which is beneficial to improving the accuracy of the ensemble model for the target task.

Fig. 12
figure 12

Results with three different weighting strategies

4.4 Full 3D numerical simulation verification

Due to the accuracy of the loss model being relatively limited, the foregoing results are further validated in high-fidelity full 3D numerical simulations. Table 4 displays the 1D aerodynamic parameters of the base turbine (referred to as BASE) and the design result by WTLBO (referred to as OPT). The 3D blade of the stator and rotor are constructed based on three blade sections at the blade's root, middle, and tip. The turbine profile of each blade section is generated by the Prichard parameterization method [31]. When constructing the 3D blade, the blade solidity and loading distributions at each blade section are repeated tuning for both the OPT and BASE. The blade area of each section of the rotor is controlled to meet the strength requirements. The stator is stacked along the leading edge of the blade sections, while the rotor is stacked along the center of gravity of each blade section.

Table 4 1D parameters on the target turbine

The 3D simulations are performed using ANSYS CFX software to solve the 3D steady viscos Reynolds‒averaged Navier‒Stokes equations with a time marching finite volume method. The second-order upwind scheme is adopted for spatial discretization. The shear stress transport (SST) model, which was verified in Ref. [32], is used for turbulence closure. The meshes are generated by Numeca Autogrid software, and the computational domain and grid of OPT are shown in Fig. 13. The total temperature, total pressure, and flow angle at the domain inlet and static pressure at the domain outlet are given as boundary conditions. The calculated y + is set to less than 1.2 to meet the requirement of the chosen turbulence model. The tip gap is set to 1% of the rotor blade height.

Fig. 13
figure 13

Computational domain and mesh for OPT

Table 5 shows the results of the 3D numerical simulation. OPT has almost the same mass flow rate and total expansion ratio as the BASE. The OPT efficiency is 0.894, which is increased by 1.2% compared to the BASE. Because the 3D effect cannot be well predicted by the loss model, the efficiency calculated by 3D simulation is slightly lower than that of the 1D prediction listed in Table 3.

Table 5 Three-dimensional numerical results

Figure 14 depicts the suction surface streamline and pressure contour of BASE and OPT. The secondary loss is reduced by OPT, as indicated by the smaller tip leakage area in the streamline. This may be because the OPT loading coefficient is lower than that of BASE, resulting in a lower pressure gradient between the rotor blade passages. Figure 15 compares the static entropy of the two turbines near the rotor blade outlet. This shows that the entropy of BASE is greater than that of OPT near the hub and shroud, which indicates that a larger loss is generated. Figure 16 further shows the efficiency of the two turbines at off-design points with rotation speeds of 80%, 90%, 100%, and 110% with expansion ratios of 1.5, 2.5, 3.0, and 3.5. Compared with BASE, the efficiency of OPT is also increased by more than 1% at all off-design points.

Fig. 14
figure 14

Suction surface streamline and pressure contour on the rotor blade

Fig. 15
figure 15

Static entropy for a BASE and b OPT near the rotor blade outlet

Fig. 16
figure 16

Efficiency of OPT and BASE at off-design points

5 Conclusions

In this paper, an efficient 1D turbine aerodynamic design method based on transfer learning and Bayesian optimization is proposed. It can transfer design experience from previous turbine design optimizations. Given the overall aerodynamic design requirements and constraints, the optimal 1D velocity triangle design parameters can be efficiently obtained. The main conclusions of this paper are as follows.

  1. (1)

    This paper proposes a novel transfer optimization learning framework, which can learn from previous designs and then transfer that knowledge to speed up optimization of the target task. As the dataset grows and the accuracy of the data improves, the framework can learn and expand its design experience autonomously.

  2. (2)

    Compared with the base turbine, the aerodynamic efficiency of the turbine designed by the proposed method is improved by more than 1% at design and off-design points. In addition, when compared to the BO and stochastic optimization methods (GA and PSO), the number of optimization evaluations required by our method is reduced by 30% and an order of magnitude, respectively, while maintaining the same aerodynamic efficiency.

  3. (3)

    The source samples generated by DoE methods and weight dilution prevention strategy for modeling the ensemble model have a significant effect on performance. The optimal solution can be improved by using source samples concentrated in optimal regions. As the number of source task samples grows, using the”drop” strategy to directly discard the corresponding basic models can help eliminate the weight dilution problem and speed up the optimization of the target design task.

  4. (4)

    One potential drawback of the proposed method is that it requires the accumulation of source samples from previous design tasks, and if no similar source samples exist for the target task, the improvement will be limited. Moreover, the proposed transfer learning framework can be extended to high-dimensional design in the turbomachinery field, and future work will focus on expanding the method to design 2D and 3D turbine blades.