1 Introduction

Helicopters experience high level of vibrations compared to other flight vehicles due to a significantly higher degree of aeroelastic interaction and rapidly rotating flexible blades [21]. The vibratory loads in helicopters typically emanate from the main rotor and can result in fatigue damage of important structural components, cause human discomfort and reduce the efficacy of weapon systems. Therefore, considerable research has been directed towards accurate modelling of helicopter rotor blades [35, 48]. Rotorcraft analysis is typically conducted using comprehensive codes [22]. These codes are needed to provide aeromechanics predictions for helicopter properties such as performance, vibration and aeroelastic stability.

Composite materials have been a natural choice for helicopter rotor blades owing to their superior strength, high stiffness-to-weight ratios and other properties which can be tailored based on requirements. In this context, the underlying assumption of multiple existing studies is that the aeroelastic response of composite helicopter rotor blade corresponding to deterministic physical (input) parameters, replicates the actual behaviour. This assumption often proves to be invalid especially for practical industrial applications where the presence of uncertainty is inevitable. Therefore, besides enhancing the fidelity of the deterministic model of composite rotor blades, the quantification of the response variation because of manufacturing anomalies is equally important, if not more so, for a more realistic description of the system behaviour. Moreover, the effect of manufacturing variability on the aeroelastic response may be aggravated due to the inherent nonlinearities and structural and aerodynamic interactions.

Consequently, active research has been carried out over the years to quantify the influence of uncertainties on the dynamic response of aerospace structures. The first comprehensive review of uncertainty quantification (UQ) in aeroelasticity can be found in [43]. For a more recent survey on the same topic, [5, 12] is recommended. However, the literature seems to be quite scarce when it comes to specific works related to the intersection of aeroelasticity, UQ, composite modelling and helicopter rotor blades. The honorable mentions of existing works falling under the above multi-disciplinary intersection are as follows:

For the first time, the influence of composite material uncertainty on the aeroelastic properties of a helicopter rotor was investigated in [39]. In particular, they studied the effect of aleatoric uncertainties on the aeroelastic response of the helicopter rotor and vibratory hub loads. Manufacturing constraints were introduced within the multidisciplinary rotor blade optimization framework in [32]. In doing so, durability and fatigue analysis were performed coupled with a probabilistic robust design methodology to reduce the effects of material, geometry (particularly, shape) and loading uncertainties on the rotor blade structural performance. The influence of spatially varying material properties were studied on the aeroelastic code predictions (e.g., rotating natural frequencies, vibratory hub loads, etc.) of composite helicopter rotors in [40]. A stochastic spectral method was employed combining Karhunen–Loéve expansion and high dimensional model representation to reduce the computational cost. Epistemic uncertainty modelling was performed in [6], illustrating high sensitivity of vibratory loads on the optimal design of inflow models. The authors went on to demonstrate the high sensitivity, such that the optimal configuration of one inflow model performed worse than the baseline design when evaluated with a different inflow model. The influence of material and manufacturing uncertainties of a composite UH-60A helicopter rotor blade model were propagated to the beam properties, the rotating natural frequencies, the aeroelastic response, and vibratory loads in hover and forward flight [44]. A micro-mechanical stochastic approach was undertaken by varying the fiber orientations of the box-spar of high-fidelity rotor-blade models. An experimental technique was recently devised for flutter speed UQ as a stochastic structural modification problem considering manufacturing tolerances, damage and degradation [2].

The common aspect and key take-away from findings of the above articles is that perturbations in the material and geometric parameters of a composite helicopter rotor can lead to significant fluctuations in the aeroelastic dynamic response, thereby accentuating the requirement for stochastic analysis. In this context, UQ has retained its popularity since past few decades. Despite its usefulness, it is computationally expensive to implement in large-scale systems [53]. Therefore, cost-effective non-intrusive UQ tools can be useful for analyzing such computationally demanding systems as they entail detailed numerical models and sophisticated deterministic solvers in which one cannot readily modify the existing framework to set up the necessary propagation tools. This is altogether more relevant for rotor analysis as the governing equations of rotorcraft aeroelasticity are typically nonlinear and any alteration of the rotorcraft analysis software needs domain specialists.

Monte Carlo Simulation (MCS) is the most widely used and simplest approach for stochastic response analysis [41]. However, MCS requires large number of simulations and thus, proves to be inefficient for large-scale detailed models. Substantial research has been carried out to improve the computational framework of MCS. In contrast to MCS and its variants, which are essentially sampling based approaches, non-sampling-based techniques, such as surrogate models are computationally viable alternative to the former [31]. These models map the input-output relationship and approximate the functional space with the help of small number of actual physics-based high-fidelity simulations. This reduces the computational effort significantly. Some recent research has used surrogate modeling for aerospace analysis and design. Batrakov et al. [3] performed optimization for the rear fuselage of a helicopter using genetic algorithms and Kriging surrogate models. Lu et al. [34] used a Kriging surrogate model of the objective function along with genetic algorithm to reduce the adverse effects of aerodynamic interactions on UH-60 type fuselage of a helicopter. Kontogiannis et al. [27] used Kriging and co-Kriging based multifidelity surrogates for aerodynamic optimization. Extensive review of surrogate models can be found in [15, 19].

Although machine learning (for example, radial basis function neural network [29], recurrent neural network and multi-layer perceptron [30]) has been employed in solving inverse problems (for example, structural health monitoring, damage detection and model updating) for rotor blade applications previously, however, we observed that the literature is scarce when it comes to the application of ML for forward stochastic aeroelastic response analysis of rotors. In this context, it is worth mentioning a recent work which has employed deep learning to emulate and extrapolate from the limited experimental responses of rotorcraft available as raw sensor (accelerometer) data and create a ’virtual sensor’ for better understanding of their vibration behaviour [36]. A data-driven framework was proposed to develop safety-based diagnostics for rotorcrafts and to define the process of selecting a single, airworthy ML-based diagnostic classifier that replaces a suite of fielded condition indicators (CI) [54]. A high-performance parallel computing framework for deep neural network (DNN) hyperparameter search using evolutionary optimization was proposed for nonlinear high-dimensional multivariate regression problems for condition monitoring of rotorcrafts [17]. The developed DNN models were capable of mapping existing CI to helicopter oil cooler vibration spectra and thereby infer the quality of the internal bearing faults [18]. The above works are a part of improving the Health and Usage Monitoring Systems (HUMS) in rotorcrafts via ML, initiated by the US Army Aviation Engineering Directorate (AED) [55]. HUMS evaluate CI to quantify the rotorcraft health state from operational flight data collected from on-board sensors. It is to be noted that the above works are data-driven ML-based rotorcraft operational analyses and did not account for (i) physics-based modelling or (ii) any form of uncertainties. Drawing motivation from the above works, this work builds upon generating ML models on limited and expensive synthetic rotor response datasets resulting from high-fidelity physics-based models for accurate and efficient UQ.

Specifically, the following points have motivated our research: (i) generation of response data by solving detailed physics-based models is computational expensive and (ii) input-response relationship is strongly nonlinear. While (i) can be addressed by conventional surrogate models, we investigate various specialized ML techniques and utilize their multi-layered architecture in this work for ensuring satisfactory approximation accuracy (point (ii)). To be precise, our work attempts to improve upon the accuracy aspect by metamodeling of the stochastic rotor responses.

To the best of the authors’ knowledge, this is the first application of advanced machine learning-driven stochastic aeroelastic analysis of helicopter rotor blades. The rest of the paper is organized in the following sequence. The aeroelastic analysis is discussed briefly in Sect. 2. The stochastic response by ML is illustrated in Sect. 3. In Sect. 4, the numerical study is undertaken and the results are interpreted. Finally, the work is summarized in Sect. 5.

2 Aeroelastic analysis

For realistic prediction of helicopter vibratory hub loads, an aeroelastic analysis is required. A comprehensive aeroelastic analyses software has been created to address this issue [21]. This aeroelastic analysis is briefly elucidated below. The equations in this section are adopted from [21] and are provided here for completeness.

2.1 Governing equations of motion

The helicopter is modeled as a nonlinear model of multiple elastic rotor blades coupled to a six-degree-of-freedom rigid fuselage. Each blade displays flap (out-of-plane) bending (w), lag (in-plane) bending (v), elastic twist (torsion) (\(\phi\)) and axial displacement (u) as shown in Fig. 1. The equations of motion are derived using the generalized Hamilton’s principle developed for non-conservative systems:

$$\begin{aligned} \delta {\Pi } = \int _{\psi _{1}}^{\psi _{2}}( \delta {U} - \delta {T} - \delta {W} ) d{\psi } = 0 \end{aligned}$$

where \(\delta {U}\), \(\delta {T}\) and \(\delta {W}\) represent the virtual variations of strain energy, kinetic energy and virtual work performed by an external force, respectively, and \(\delta {\Pi }\) indicates the total potential of the system. The \(\delta {U}\) and \(\delta {T}\) are derived using the Hodges and Dowell approach and incorporate a moderate deflection theory [21]. The external aerodynamic forces acting on the rotor blade add to the virtual work variational \(\delta {W}\). The aeroelastic analysis applied in this paper considers aerodynamic forces and moments which are calculated using free wake analysis, incorporate a reverse flow model and address time domain unsteady aerodynamics.

Fig. 1
figure 1

Elastic rotor blade

2.2 Finite element-spatial discretization

The governing equations of motion are converted to discrete form using finite element (FE) analysis. This analysis is valid for nonuniform blade properties. Once discretized, equation (1) is expressed as:

$$\begin{aligned} \delta {\Pi } = \int _0^{2\pi } \sum _{i=1}^{N} (\delta {U_i} - \delta {T_i} - \delta {W_i}) d{\psi } = 0 \end{aligned}$$

The beam is divided into N spatial finite elements. Each of the N finite elements incorporates fifteen degrees of freedom. These fifteen degrees of freedom incorporate six degrees of freedom at each boundary node (axial and torsion displacement, flap and lag bending displacement, flap and lag bending slope) and two internal nodes for axial displacement and one internal node for torsion displacement. These degrees of freedom correspond to cubic variations in axial elastic and (flap and lag) bending deflections, and quadratic variation in elastic torsion. Between the elements, there is continuity of slope and displacement for flap and lag bending deflections and continuity of displacements for elastic twist and axial deflections. This FE guarantees physically consistent linear variations of bending moments and torsion moments and quadratic variations of axial force inside the elements. The shape functions used here are Hermite polynomials for lag and flap bending and Lagrange polynomials for axial and torsion deflection and are given in [7]. In this paper, cantilever boundary conditions are considered and the rows and columns corresponding to the root node in the global mass, stiffness, damping matrices and the force vector are discarded. For the numerical results, a non-uniform mesh with thirteen elements is used.

Substituting u = Hq ( H is the shape function matrix) into the expression for Hamilton’s principle yields:

$$\begin{aligned}&\int _{0}^{2\pi } \delta {\mathbf{q}}^{\mathrm{T}}\; \Big ({\mathbf{M}}{ {\ddot{\mathbf{q}}}}(\psi ) + {{\mathbf{C}}}{{\dot{\mathbf{q}}}}(\psi ) + {\mathbf{Kq}}(\psi ) - {{\mathbf{F}}}({\mathbf{q}},{ {\dot{\mathbf{q}}}},\psi )\Big )\;{\mathrm{d}}\psi = 0 \end{aligned}$$

The space functionality is thus eliminated by applying FE discretization and the governing partial differential equations are reduced to ordinary differential equations.

2.3 Normal mode transformation

Each rotor blade is modeled using FE equations. These equations are transformed into normal mode space to facilitate the computationally efficient solution of blade response. The displacements are enunciated with respect to normal modes as \({\mathbf{q}}=\varvec{\Phi } {\mathbf{p}}\). Substituting \({\mathbf{q}}=\varvec{\Phi } {\mathbf{p}}\) in Eq. (3) yields the equations in normal mode coordinates:

$$\begin{aligned} \int _{0}^{2\pi } \delta {\mathbf{p}}^{\mathrm{T}}\; \Big ({{{\bar{\mathbf{M}}}}}{{\ddot{\mathbf{p}}}}(\psi ) + {{{\bar{\mathbf{C}}}}}{{\dot{\mathbf{p}}}} (\psi ) + {{{\bar{\mathbf{K}}}}\mathbf{p}}(\psi ) - { {{\bar{{{\mathbf{F}}}}}}}({\mathbf{p}},{{\dot{\mathbf{p}}}},\psi )\Big )\;{\mathrm{d}}\psi = 0 \end{aligned}$$

where the mass, stiffness and damping matrices and the force vector in the normal mode space are expressed as \({\bar{\mathbf{M}}}={\Phi }^{T}{\mathbf{M}}\Phi\), \({\bar{{\mathbf{C}}}}={\Phi }^{T}{\mathbf{C}}\Phi\), \({\bar{\mathbf{K}}}={\Phi }^{T}{\mathbf{K}}\Phi\) and \({\bar{{\mathbf{F}}}}={\Phi }^{T}{\mathbf{F}}\) , respectively. Integration of Eq. (4) by parts yields:

$$\begin{aligned} \int _{0}^{2\pi } \left\{ \begin{array}{c} \delta {\mathbf{p}}\\ \delta \dot{\mathbf{p}}\\ \end{array} \right\} \left\{ \begin{array}{c} \bar{\mathbf{F}}-\bar{\mathbf{C}}\dot{\mathbf{p}}-\bar{\mathbf{K}}{\mathbf{p}}\\ \bar{\mathbf{M}}\dot{\mathbf{p}}\\ \end{array} \right\} d\psi = \left\{ \begin{array}{c} \delta {\mathbf{p}}\\ \delta \dot{\mathbf{p}}\\ \end{array} \right\} \left\{ \begin{array}{c} \bar{\mathbf{M}}{\dot{\mathbf{p}}}\\ 0\\ \end{array} \right\} \Bigg |_{0}^{2\pi } \end{aligned}$$

The right hand side of the aforementioned equation vanishes due to the periodic nature of the rotor steady state response. Consequently, Eq. (5) generated the system of first order differential equations:

$$\begin{gathered} \int_{0}^{{2\pi }} \delta {\mathbf{y}}^{{\text{T}}} {\mathbf{Q}}\;{\text{d}}\psi = 0 \hfill \\ {\text{where}}\;\;\;\;\;{\mathbf{y}} = \left\{ {\begin{array}{*{20}c} {\mathbf{p}} \\ {{\mathbf{\dot{p}}}} \\ \end{array} } \right\}\;\;{\text{and}}\;\;\;\;\;\;{\mathbf{Q}} = \left\{ {\begin{array}{*{20}c} {\overline{{\mathbf{F}}} - \overline{{\mathbf{C}}} {\mathbf{\dot{p}}} - \overline{{\mathbf{K}}} {\mathbf{p}}} \\ {\overline{{\mathbf{M}}} {\mathbf{\dot{p}}}} \\ \end{array} } \right\} \hfill \\ \end{gathered}$$

For the numerical results, four flap, three lag, two torsion and one axial mode are used.

2.4 Finite element-temporal discretization

The abovementioned equation is nonlinear since the force vector \(\bar{\mathbf{F}}\) incorporates nonlinear terms. These periodic, nonlinear, ordinary differential equations are solved to yield the blade steady response. Here, the FE in time is applied in conjunction with the Newton-Raphson method. We now discretize Eq. (6) over \(N_{t}\) time elements around the circumference or the rotor disk (where \(\psi _{1}=0, \psi _{N_{t}+1}=2\pi\)). Then, we consider a first order Taylor’s series expansion about the steady-state value \({\mathbf{y}}_{\mathrm{0}}=[{\mathbf{p}}^{\mathrm{T}}_{\mathrm{0}}\;\; \dot{\mathbf{p}}^{\mathrm{T}}_ \mathrm{0} ]^{\mathrm{T}}\). This process yields the following algebraic equations.

$$\begin{aligned}&\sum _{i=1}^{N_{t}}\int _{\psi _{i}}^{\psi _{i+1}}\delta {\mathbf{y}}^{T}_{i} {\mathbf{Q}}_{i}({\mathbf{y}}_{0}+ \triangle {\mathbf{y}}) d\psi = \sum _{i=1}^{N_{t}}\int _{\psi _{i}}^{\psi _{i+1}}\delta {\mathbf{y}}^{T}_{i}[{\mathbf{Q}}_{i}({\mathbf{y}}_{0}) + \mathbf{K}_{ti}({\mathbf{y}}_{0}) \triangle {\mathbf{y}}] d\psi =0 \end{aligned}$$
$$\begin{aligned}&\mathbf{K}_{ti}=\left[ \begin{array}{cc} \frac{\partial \bar{{\mathbf{F}}}}{\partial {\mathbf{p}} } - {\bar{\mathbf{K}}} &{} \frac{\partial \bar{{\mathbf{F}}}}{\partial \dot{\mathbf{p}} } - {\bar{{\mathbf{C}}}}\\ 0 &{} {\bar{\mathbf{M}}}\\ \end{array}\right] _i \end{aligned}$$

Here, \(\mathbf{K}_{ti}\) represents the tangential stiffness matrix for time element i. Furthermore, \({\mathbf{Q}}_{i}\) is the load vector for time element i. The modal displacement vector can be written as follows:

$$\begin{aligned} {\mathbf{p}}_i(\psi )={\mathbf{H}}(s){\mathbf{r}}_i \end{aligned}$$

Here, \({\mathbf{H}}(s)\) are time shape functions (in terms of the element coordinates) which are fifth-order Lagrange polynomials [8] used for approximating the normal mode coordinate \({\mathbf{p}}\). \({\mathbf{r}}\) is the temporal nodal coordinate needed to describe the variation of \({\mathbf{p}}\) within the element. Continuity of generalized displacements is enforced between the time elements. A Lagrange–Hermite polynomials are used for interpolation inside the time element. For the numerical results, a uniform mesh with eight time elements is used.

Now, we substitute Eq. (9) and its derivative into Eq. (7). Thereafter, an iterative solution provides the blade steady response.

2.5 Aerodynamic loads

The air velocity in the blade-deformed plane is computed first. The blade airloads in the rotating deformed frame are then determined after applying two-dimensional strip theory. Wake model is used for the inflow and time domain unsteady aerodynamics is invoked.

2.6 Rotor and hub loads

The force summation method is applied to determine the steady and vibratory components of the rotating frame blade loads. In the force summation method, the blade inertia and aerodynamic forces are first integrated along the length of the blade. Then, the fixed frame hub loads are computed after summing the contributions of individual blades at the root.

2.7 Coupled trim

The helicopter must be trimmed for a proper assesment of the loads. The trim procedure for the helicopter mandates calcuating the pilot control angles \({\varvec{\Theta }}\) which cause the six steady forces and moments acting on the helicopter to vanish. The solution of the nonlinear trim equations is performed using a Newton-Raphson procedure and many iterations are typically performed. A process known as coupled trim is conducted to solve the pilot input trim controls, blade steady response and vehicle orientation, simultaneously. This method is called coupled trim because the blade response equations and trim equations are simultaneously solved to incorporate the effect of elastic blade deflection on the rotor steady forces. Further information about the aeroelastic analysis including derivations of the blade governing equations is expounded in reference [21].

3 Stochastic analysis via machine learning

3.1 General computational framework

A physical system governed with the help of a set of equations (for example, differential equations), the general input-output functional form of the model can be expressed as

$$\begin{aligned} {\mathbf{y}} = {\mathbf{M}}({\mathbf{x}}) \end{aligned}$$

where \({\mathbf{x}}\in {\mathbb {R}}^M\) is a vector of input parameters of the model, representing the geometrical details, the material model and the loading. \({\mathbf{y}}\in {\mathbb {R}}^Q\) is the vector of response quantities of interest such as,

  • The displacement response or its related components,

  • Natural frequency, modal contribution factors and other response components in the eigen-space,

  • Strain and stress component tensor at specified locations,

  • Plastic strain and other internal damage measuring metrics,

  • Spatial and temporal evolution of the above quantities.

Here, our motive is to set up a generalized non-intrusive framework for UQ, in which the computational model \({\mathbf{M}}\) can be construed as a black box, i.e., the model configuration settings cannot be edited by the user at any point and will only yield unique response values for each combination of the input vector. Also, \({\mathbf{M}}\) is deterministic in nature as repeating the analysis with the same set of input parameters more than once will lead to the same exact value of the output response quantity. The stochastic input parameters can be modelled by random realizations of the vector \({\mathbf{x}}\in {\mathbb {R}}^M\) in accordance to the particular probability density function \(f_x({\mathbf{x}})\). The conventional techniques involve the use of statistical inference based approaches, such as maximum likelihood estimate and criteria like, Akaike and Bayesian information for selecting the best fit distribution [45, 51]. Alternative approaches include Bayesian statistics which can supplement the model prediction by utilizing measurement data in conjunction with the system physics and the maximum entropy approach for cases where there is scarce or no data.

3.2 Gaussian process modelling

The Gaussian process (GP) is a surrogate modelling technique which fits probability distributions over functions. GP is a spatial interpolation technique originally developed for geostatistics [28].

The functional form is expressed below by considering an independent variable \({\mathbf{x}} \in {\mathbb {R}}^d\) and function \(g({\mathbf{x}})\) such that \(g : {\mathbb {R}}^d \rightarrow {\mathbb {R}}\), a GP over \(g({\mathbf{x}})\) with mean \(\mu ({\mathbf{x}})\) and covariance function \(\kappa ({\mathbf{x}},{\mathbf{x}}';\Theta )\) can be defined as

$$\begin{aligned} \begin{aligned}&g({\mathbf{x}}) \sim GP (\mu ({\mathbf{x}}),\kappa ({\mathbf{x}},{\mathbf{x}}';\Theta )),\\&\mu ({\mathbf{x}}) = {\mathbb {E}}[g({\mathbf{x}})]\\&\kappa ({\mathbf{x}},{\mathbf{x}}';\Theta ) = {\mathbb {E}}[(g({\mathbf{x}})-\mu ({\mathbf{x}}))(g({\mathbf{x}}')-\mu ({\mathbf{x}}'))] \end{aligned} \end{aligned}$$

where \(\Theta\) represents the hyperparameters of the covariance function \(\kappa\). The covariance function \(\kappa\) models any prior knowledge about \(g({\mathbf{x}})\) and can cope with the approximation of arbitrary complex functions. In a way, the covariance function brings in interdependencies between the function value corresponding to different inputs. The squared exponential (Gaussian) covariance function illustrated in Eq. (12) is used here.

$$\begin{aligned} \begin{aligned} \kappa ({\mathbf{x}},{\mathbf{x}}') = \sigma _g^2\exp \left[ -\sum _{i=1}^{d}\frac{(x(i) - x'(i))^2}{2r_i^2} \right] \end{aligned} \end{aligned}$$

where \(\{\sigma _g,r_1,\ldots ,r_d\} = \Theta\) are the hyperparameters of the covariance function.

One perspective of viewing GP is the function space mapping describing the input-output relationship [45]. As opposed to conventional modelling techniques which employ fitting a parameterized mathematical form to map the input-output functional space, a GP does not assume any explicit form, instead holds a prior belief (in the form of the mean and covariance function) onto the space of model (response) functions. Thus, GPs can be classified as a ’non-parametric’ model as the number of parameters in the model are governed by the number of available data points.

Universal Kriging (a general form of GP) is employed here [33]. This constitutes a second-order polynomial trend function and GP as shown below

$$\begin{aligned} \begin{aligned} \mathbf{Y}({\mathbf{x}}) = \sum _{j=1}^{p} \beta _j \mathbf{f}_j({\mathbf{x}}) + \mathbf{Z}({\mathbf{x}}) \end{aligned} \end{aligned}$$

where \({\varvec{\beta }} = \{ \beta _j, j=1,\ldots , p \}\) is the vector of unknown coefficients and \({{\mathbf{F}}} = \{ \mathbf{f}_j, j=1,\ldots , p \}\) is the matrix of polynomial basis functions. \(\mathbf{Z}({\mathbf{x}})\) is the GP with zero mean and autovariance \(\text {cov}[\mathbf{Z}({\mathbf{x}}),\mathbf{Z}(\mathbf{x'})] = \sigma ^2 {\mathbf{R}}({\mathbf{x}},\mathbf{x'})\), where \(\sigma ^2\) is the process variance and \({\mathbf{R}}({\mathbf{x}},\mathbf{x'})\) is the autocorrelation function.

The parameters \({\varvec{\beta }}\) and \(\sigma ^2\) can be estimated by the maximum likelihood estimate (MLE) defined by the following optimization problem under the assumption that the noise \(\mathbf{Z} = \mathbf{Y} - {{\mathbf{F}}} {\varvec{\beta }}\) is a correlated Gaussian vector

$$\begin{aligned}&(\hat{\varvec{\beta }},\hat{\sigma ^2}) = \underset{ {\varvec{\beta }},\sigma ^2}{\text {arg max}}\; \mathbf{L}({\varvec{\beta }},\sigma ^2|{\mathbf{y}}) = \frac{1}{((2 \pi \sigma ^2)^n {\text{ det }} {\mathbf{R}})^2}\nonumber \\&\exp \left[ - \frac{1}{2\sigma ^2} ({\mathbf{y}} - {{\mathbf{F}}}{\varvec{\beta }})^T {\mathbf{R}} ^{-1}({\mathbf{y}} - {{\mathbf{F}}}{\varvec{\beta }}) \right] \end{aligned}$$

Upon solving Eq. (14), the estimates \((\hat{\varvec{\beta }},\hat{\sigma ^2})\) can be obtained as

$$\begin{aligned} \begin{aligned} \hat{\varvec{\beta }}&= ({{\mathbf{F}}}^T {\mathbf{R}}^{-1} {{\mathbf{F}}})^{-1} {{\mathbf{F}}}^T {\mathbf{R}}^{-1} {\mathbf{y}} \end{aligned} \end{aligned}$$
$$\begin{aligned} \begin{aligned} \hat{\sigma ^2}&= \frac{1}{n} ({\mathbf{y}} - {{\mathbf{F}}}{\varvec{\beta }})^T {\mathbf{R}} ^{-1}({\mathbf{y}} - {{\mathbf{F}}}{\varvec{\beta }}) \end{aligned} \end{aligned}$$

where \({\mathbf{y}}\) represents the model response such that \({\mathbf{y}} = \{ y_1, \ldots , y_n \} ^T\).

The prediction mean and variance by GP can be obtained as

$$\begin{aligned} \begin{aligned}&\mu _{\hat{Y}}({\mathbf{x}}) = {{\mathbf{F}}}^T \hat{\varvec{\beta }} + {\mathbf{r}}^T {\mathbf{R}}^{-1} ({\mathbf{y}} - {{\mathbf{F}}}\hat{\varvec{\beta }}) \end{aligned} \end{aligned}$$
$$\begin{aligned} \begin{aligned}&\sigma ^2_{\hat{Y}}({\mathbf{x}}) = \hat{\sigma ^2}[1 - {\mathbf{r}}^T {\mathbf{R}}^{-1}{\mathbf{r}} + \mathbf{u}^T ({{\mathbf{F}}}^T {\mathbf{R}}^{-1} {{\mathbf{F}}})^{-1} \mathbf{u}] \end{aligned} \end{aligned}$$

where \(\mathbf{u} = {{\mathbf{F}}}^T {\mathbf{R}}^{-1} {\mathbf{r}} - {\mathbf{R}}\) and \({\mathbf{r}}\) are the autocorrelation between unknown point \({\mathbf{x}}\) and each point of the observed dataset.

Some unique features of the above formulation are: (i) The prediction is exact at the training points and the associated variance is zero. (ii) It is asymptotically zero which means as the size of the observed dataset increases, the overall variance of the process decreases. (iii) The prediction at a given point is considered as a realization of a Gaussian random variable. Thus, it is possible to derive confidence bounds on the prediction. Other adaptive versions of GP can be found in [13, 14, 38].

Fig. 2
figure 2

Structure of a typical CNN

3.3 Convolution neural networks (CNN)

The convolutional neural network is a deep learning feed-forward neural network that has revolutionised computer vision in recent years. It is typically adopted in image and video recognition [56]. The main advantage of the CNN is its inherent ability to perform feature engineering from data automatically without the need for a manual feature engineering procedure, which is a long-standing bottle-neck in traditional machine learning [23]. Figure 2 shows the structure of the typical CNN. Within the model, the input layer develops a feature graph from the input data, which corresponds to the convolution kernel. This kernel uses a set of weights to develop this feature graph. The link between the input and convolution layer is established by a receptive field, which is a square matrix of weights having sizes smaller than the input vector. As the receptive field strides along the input, it executes the convolution operation, described using the equations

$$\begin{aligned} y_{ij} = \sigma \bigg (\sum _{r=1}^F \sum _{c=1}^F w_{rc}x_{(r+1 \times S)(c+j \times S)}+b \bigg ) \end{aligned}$$
$$\begin{aligned} 0 \le i \le \frac{H-F}{S}, \quad 0 \le j \le \frac{W-F}{S} \end{aligned}$$

where \(y_{ij}\) is the output of a node, H and W represent the height (vertical) and width (horizontal) dimensions of the input, respectively. F represents the height and width size of the receptive field; and S denotes the stride length. The term \(x_{(r+1 \times S)(c+j \times S)}\) refers to the input data element with coordinates \({(r+1 \times S)(c+j \times S)}\), and \(w_{rc}\) and b denote the weight positioned on the receptive field and the bias, respectively. Also, \(\sigma\) represents the nonlinear activation function used to extract the features from the input.

Within the convolution layer, the input size \((H \times W \times D)\) is reduced to \([ \frac{H-F+2P}{S+1} \times \frac{W-F+2P}{S+1} \times K]\), where K denotes the number of filters. This process progressively decreases the dimension as the convolution layer stack becomes deeper. The pooling layer performs two key functions: (i) reduce the spatial dimension of the input layer by (typically) up to 75% and (ii) control overfitting.

Fig. 3
figure 3

Multi-task deep neural network adopted in this study

Within this study, a multi-task learning deep neural network architecture is adopted. According to [11], multi-task learning applies an inductive transfer learning mechanism to improve performance in neural networks. In other words, multi-task learning trains tasks in parallel, using a shared or common representation. In multi-task learning, a neural network is trained jointly for multiple tasks and has been proven to improve predictive performance in tasks, with the prerequisite being that the tasks share conceptual similarity, or are not in competition [50]. Figure 3 represents the multi-task learning architecture adopted in this research study. As can be seen, the spine of the network is a fully connected network (FCN) feature extracted convolutional block (the shared convolutional block). Consequently, let us represent the shared convolutional block as:

$$\begin{aligned} f(x; \theta ) \end{aligned}$$

where \(x \in {\mathcal {X}}\) and \(\theta\) represent the parameters of the function f. Therefore, for each output, y, we define an output function \(g_{y}(f(x);\theta _y)\), where \(\theta _y\) are the parameters from the output-specific layer and \(y \in {\mathcal {Y}}\), where \({\mathcal {Y}}\) denotes the set of outputs. For this study, the function f of the shared convolutional block is approximated using a 1-dimensional fully connected convolutional network (FCN), as depicted in Fig. 2. Consequently, the CNN and FCN networks are gold standard state-of-the-art deep neural networks, typically adopted for computer vision and image recognition tasks. However, the FCN and CNN networks have been used in non-image classification by acting as a feature extractor.

Similarly, the model was trained in a greedy, layer-wise manner, which involves successively adding a new hidden layer to the model and refitting, allowing the newly added model to learn the inputs from the existing hidden layer, while keeping the weights for the existing hidden layers fixed. This aids to reduce/eliminate the vanishing gradient problem encountered in neural networks. Consequently, each layer was trained sequentially starting from the bottom layer (input), with each subsequent layer learning a higher-level representation of the layer below [4, 26].

3.4 Multi-layer perceptron (MLP)

Artificial neural networks (ANN) are considered as complex predictive models, due to their ability to handle multi-dimensional data, nonlinearity, and adept learning ability and generalisation [23]. The basic framework of a neural network incorporates four atomic elements, namely: (i) nodes, (ii) connections/weights, (iii) layers, and (iv) activation function. In the MLP, the neurons represent the building blocks. These neurons, which are simple processing units, each have weights that return weighted signals and an output signal, which is achieved using an activation function. The MLP reduces error by optimisation algorithms or functions, such as backpropagation [10, 25, 47].

In an MLP, the set of nodes are connected together by weighted connections, which can be analogous to the coefficients in a regression equation. These weighted connections represent the connecting interactions. The optimal weights of each connection between a set of layers are calculated during each backward pass of a training dataset, which is also used for weight optimisation using the derivatives obtained from the input and predicted values of the training data [24]. The layers represent the network topology, representing neuron interconnections. Within the network, the transfer function or activation function represents the transfer function or state of each neuron. The basic process in a single neuron is presented in Fig. 4. In the MLP, an external input vector is fed into the model during training. In the case of binary classification problems, during the training, the output is clamped to either 0 or 1, via the sigmoid activation function. For this present study, given that the nature of the study was regression-based, real-valued forecasting was performed using real-valued loss functions, such as mean squared error (MSE). A particular variation of neural networks is the feed-forward neural network. This is widely used in modelling many complex tasks, with the generic architecture depicted in Fig. 5. As the figure shows, the elementary model structure comprises three layers, namely the input, hidden, and output layers, respectively. In feed-forward neural networks (FFNN), each individual neuron is interconnected to the output of each unit within the next layer.

Fig. 4
figure 4

Schematic representation of the basic process in a single neuron

Fig. 5
figure 5

Multi-task deep neural network adopted in this study

Consequently, it has been proven that an MLP, trained to minimise a loss or cost function between an input and output target variable using sufficient data, can accurately produce an estimate of the posterior probability of the output classes based on the discriminative conditioning of the input vector, which is the applied approach in this study.

3.5 Random forest

The Random Forest algorithm is an ensemble learning algorithm—these are algorithms that obtain their final results as aggregates of the individual forecasts of the many generated classifiers. In other words, the random forest comprises a collection of \({\mathcal {T}}\) tree-structured classifiers \(\{T_1(X, \theta _1), T_2(X, \theta _2), \dots , T_{{\mathcal {T}}}(X, \theta _{{\mathcal {T}}})\}\), where \(X=\{x_1, x_2, \dots , x_p\}\) is a p-dimensional independent and identically distributed (i.i.d) random vector of input features, and each \(\theta _i \in {\mathbb {R}}\) represents the parameters for each individual classifier, which casts its vote for the most popular class at the input vector X. The output of the ensembles contains \({\mathcal {T}}\) outputs, \(\{\hat{Y}_1=T_1(X), \hat{Y}_2=T_2(X), \dots , \hat{Y}_{{\mathcal {T}}}=T_{{\mathcal {T}}}(X)\}\), where \(\hat{Y}_t=1, \dots , {\mathcal {T}}\) is the predicted class from the t-th tree. The final output is an aggregate of all the predicted classes, which is the class with the majority vote.

The training procedure for the random forest algorithm is as follows. Consider a dataset comprising n samples, \({\mathcal {D}}=\{(X_1,Y_1), (X_2,Y_2), \dots , (X_n,Y_n)\}\), where \(X_i\), \(i=1,2,\dots ,n\) is a vector of input features and \(Y_i\) corresponds to the class label (i.e., True or False in binary classification). Training a random forest on this dataset is as follows.

  1. 1.

    Draw a randomly sampled bootstrap observation (with replacement) from the n observations in the training data.

  2. 2.

    From this bootstrap, grow a tree using the rules: select the best split from a (randomly selected) subset of m features. In other words, keep m as a tuning parameter for the algorithm. Grow the tree until no further split is possible, and the tree is also not pruned back.

  3. 3.

    Repeat the preceding steps until \({\mathcal {T}}\) trees are grown. When \(m=n\), then the best split at each node is selected from all the features.

For this study, given that the focus is on the inference of a numerical outcome data Y, the random forest regressor function from the scikit-learnFootnote 1 package in Python was adopted instead of the classifier. The assumption that the input training data are independently drawn from the joint distribution of (XY) and is made up of \(n(p+1)\) tuples \((x_1, y_1), (x_2, y_2), ..., (x_n, y_n)\). In the regressor, the random forest prediction function is an unweighted average over the collection of individual learners \(h(x) = (1/K) \sum ^K_{k=1} h(\mathbf{x }; \theta _k)\).

3.6 Support vector machine

Classical learning algorithms are trained by minimising the error on the training dataset and this process is called empirical risk minimisation (ERM). Many machine learning algorithms learn using ERM, such as neural networks and regression-based algorithms. However, the support vector machine is based on the structural risk minimisation (SRM) principle, a statistically relevant method [46]. Some studies have proven that this method results in improved generalisation performance, given that SRM is obtained by reducing the upper bound of the generalisation error [9, 16, 49]. The support vector algorithm was developed by Russian statisticians Vapnik and Lerner [52].

To describe the inner working of the SVM, consider input data \(X=\{x_i, x_2, \dots , x_n\}\), where n represents the number of samples having two distinct classes (i.e., True and False). Assume each class associated to label \(y_i=1\) for true and \(y_i=0\) for false. For linear input data, we define a hyperplane \(f(X)=0\) that separates the given data. We define a linear function f of the form:

$$\begin{aligned} f(X) = W^TX + b = \sum ^m_{j=1} w_jx_j+b=0 \end{aligned}$$

where W \(\in\) \({\mathbb {R}}^{n\times 1}\), and b is a scalar. Together, the vector, W, and b can be used to define the position of the hyperplane. The output of the model uses f(X) to create a hyperplane that classifies the input data to either class (i.e., True or False). It is important to note that, for an SVM, the satisfying conditions for the hyperplane can be presented as

$$\begin{aligned} y_if(x_i) = y_i(W^Tx_i+b)\ge 1,i=\{1,2,\dots ,n\} \end{aligned}$$

For nonlinear classification tasks, the kernel-based SVM can be adopted. In this case, the data to be classified are mapped to a high-dimensional feature space where linear separation using a hyper-plane is possible. Consider a nonlinear vector, \(\Phi (X)=(\phi _1(X), \dots , \phi _l(X))\), which can be used to map the m-dimensional input vector X to an l-dimensional feature space. The linear decision function, therefore, used to make this transformation can be given as

$$\begin{aligned} f(X) = sign\bigg (\sum ^n_{i,j=1} \alpha _iy_i(\Phi (X_j) + b\bigg ) \end{aligned}$$

Although using SVM for nonlinear classification by working in the high-dimensional feature space results in benefits, for instance modelling complex problems, there are drawbacks, brought about by excessive computational requirement and overfitting. The SVM described in this section is traditionally a classifier, indicating that it is mainly applied to classifcation problems. However, the problem under investigation is a regression (i.e., real-valued forecasting) problem. For this class of problems, the support vector regression (SVR) algorithm is applied instead. The SVR still adopts the same properties as the SVM, but replaces the decision boundary in the classification problem with a match between the vector and a position in the data plane [20]. Consequently, the support vectors participate to find the closest match between the data points and the actual function representing them.

4 Numerical study

4.1 Description of the parametric stochastic model

A soft-inplane hingeless rotor with four main rotor blades which is similar to the BO105 rotor is addressed in this paper [21]. The results are generated at a non-dimensional forward speed (advance ratio) of 0.3. The rotor has 4 blades, a radius (R) of 4.94 m, hover tip speed (\(\Omega R\)) of 198.12 m/s, chord equal to 8 percent of radius, rotor solidity of 0.10 and a thrust coefficient to solidity ratio of 0.07. The mass per unit length of the blade (\(m_0\)) is 6.46 kg/m. In this section, the influence of material and geometric uncertainties on the (a) cross-sectional stiffness (b) natural frequencies of the non-rotating and rotating blades and (c) aeroelastic response of the composite rotor blade are investigated. In doing so, the intention is to quantify the uncertainty involved in the fabrication process and its effect on the system level response.

The blade flap bending stiffness \((EI_{y})\), blade lag bending stiffness \((EI_{z})\) and blade torsion stiffness (GJ) are considered to be random. These input quantities are representative of simulating the manufacturing variability as they are expressed as functions (product) of material and geometric parameters. It is intuitive to realize that material properties are prone to higher degree of variation than that of geometric parameters during a fabrication process as the former stems from a micro-mechanical model and thus, approximation in homogenization and constitutive laws may lead to modelling uncertainty being propagated to the macro-scale. Additionally, the advanced manufacturing processes have developed significantly higher precision in producing exact geometrical shapes. Having said this, the complexity associated with fabricating irregular geometries may introduce errors.

Therefore, the combined effect of material and geometric parameters are studied, with the rational outcome of the former being more sensitive on the response as it suffers from a higher level of randomness. As the random parameters involve cross-sectional stiffnesses of the rotor blade, it is practical to assume that all realizations corresponding to these parameters will be positive and therefore, they are assumed to be log-normally distributed. However, the ML-based stochastic framework employed here is generalized so that it can deal with all possible probabilistic distributions.

Murugan et al. [39] conducted a systematic study of the effect of uncertainty in composite material properties on the blade stiffnesses. They found that 5 \(\%\) coefficient of variation (C.O.V.) was typical of composite material microlevel properties such as Young’s modulus and Poisson’s ratios. A value of 10 \(\%\) C.O.V. was more representative of the higher level of dispersion which can occur for helicopter blades [42]. When the microlevel properties with 10 \(\%\) C.O.V. were inserted into a composite blade analysis program, the values of C.O.V. for \(EI_y\), \(EI_z\) and GJ were obtained as 6.88, 8.93 and 8.44, respectively. To err on the side of caution, we assume a higher value of 10 \(\%\) C.O.V. on the blade stiffnesses in this paper to account for material uncertainty as well as some defects which may creep into the material during service life. The description of the random parameters is given in Table 1. Note that these quantities are non-dimensionalized by \(m_0 \Omega ^2 R^4\).

Table 1 Description of random parameters considered for the stochastic rotor blade analysis

The aeroelastic response quantities of interest which are studied as part of the stochastic analysis here are: (i) blade first flap rotating frequency (\(\omega _{1}^{f}\)), (ii) blade second flap rotating frequency (\(\omega _{2}^{f}\)), (iii) blade first lag bending frequency (\(\omega _{1}^{L}\)), (iv) blade second lag bending frequency (\(\omega _{2}^{L}\)), (v) blade first torsion frequency (\(\omega _{1}^{T}\)), (vi) power required by main rotor (P), (vii) vibratory longitudinal hub force (\(f_{x}^{4\Omega }\)), (viii) vibratory lateral hub force (\(f_{y}^{4\Omega }\)), (ix) vibratory vertical hub force (\(f_{z}^{4\Omega }\)), (x) vibratory rolling hub moment (\(m_{x}^{4\Omega }\)), (xi) vibratory pitching hub moment (\(m_{y}^{4\Omega }\)), (xii) vibratory yawing hub moment (\(m_{z}^{4\Omega }\)), (xiii) vibration objective function (J) defined in Eq. (25), and (xiv) lowest damped stability mode (Damping).

$$\begin{aligned} \begin{aligned} J = \sqrt{ (f_{x}^{4\Omega })^2 + (f_{y}^{4\Omega })^2 + (f_{z}^{4\Omega })^2} + \sqrt{(m_{x}^{4\Omega })^2 +(m_{y}^{4\Omega })^2 + (m_{z}^{4\Omega })^2} \end{aligned} \end{aligned}$$

It can be observed from Eq. (25), that J can be evaluated readily from the predicted force and moment terms. In spite of this, we attempt to capture the trend of J separately as an individual quantity as it is commonly used as objective function in aeroelastic design optimization frameworks [35]. This is intended for assessing the accuracy of ML techniques in capturing the stochastic trend of J. If the approximation quality is found to be satisfactory, there may be a huge scope towards potential application of ML in aeroelastic optimization under uncertainty frameworks in future.

For training all the ML models, 100 training points had been initially generated using a Latin hypercube sampling (LHS) scheme [37]. This was implemented with the help of the ”lhsdesign” built-in function of MATLAB and the ”maximin” option which maximises the minimum distance between points. However, it was observed that the actual FE model of the rotor blade did not converge for 4 of those points in the experimental design. Therefore, the ML models were eventually trained on the basis of the remaining 96 realizations of stochastic input parameters and converged aeroelastic responses.

4.2 Adaptive Gaussian process

The MATLAB-based DACE platform was employed to implement the GP model in this work [33]. The squared exponential correlation function was used to construct the GP and 10-fold cross-validation (CV) was performed to access the model accuracy. The starting point, lower bound and the upper bound of the hyperparameters in the correlation function were adopted as, [10 10 10], [0.1 0.1 0.1] and [20 20 20], respectively, for each of the response quantities.

The root mean squared error (RMSE) obtained by GP is reported in Table 2. Note that the numerical values for all the response quantities other than the natural frequencies are small in magnitude (ranging from \(10^{-1}\) to \(10^{-5}\)) as they are non-dimensional, and therefore, a non-relative statistical error metric (without denominator) such as RMSE has been selected in this work. It can be observed from the RMSE values in Table 2, that GP achieved a decent level of accuracy in estimating most of the response quantities. Since RMSE is not a relative error metric, in order to access the level of accuracy achieved for the respective response quantity, an idea of the magnitude of that response quantity is necessary. This can be obtained by referring to the mean values of the actual or predicted response quantities corresponding to a test dataset reported in Table 4. In particular, while approximating the following four response quantities \(f_{z}^{4\Omega }\), \(m_{z}^{4\Omega }\), J, and damping, relatively high errors by GP were observed due to their nonlinear fluctuations.

As an attempt to improve the approximation of GP for the above four response quantities, an adaptive sampling strategy by using the inherent predictive variance feature of GP is proposed. This is explained as follows:

  • A GP model is built on the 96 input design points and for each of the above four response quantities (evaluated at these 96 design points).

  • 10,000 random realizations of input variables are generated by MCS and the above GP model is used to predict the four response quantities corresponding to the realizations. The MSE by GP is also obtained corresponding to these 10,000 points with the help of the predictive variance feature of GP.

  • The MSE obtained is normalized over the mean of the respective predicted response quantity and sorted in descending order.

  • It is observed for all of the four response quantities, only the first 1000 MSE values are relatively large compared to the remaining 9000 values. So, only the first 1000 MSE values are taken into consideration to focus on the highly erroneous regions.

  • The realizations of the input variables corresponding to the first 1000 MSE values are stored. This is to identify the region in the input design space where the GP predictor does not capture the response trend adequately and therefore, needs additional sample points in those regions.

  • With the help of these 1000 input design points identified in the above step, the input space is partitioned into 3 clusters, corresponding to each of the four response quantities. This grouping is done according to the magnitude of the error obtained while approximating the response quantities. The purpose is to provide most number of points in the first cluster which consists of the input points for which maximum error was achieved. Similarly, the second cluster is represented by fewer points than the first cluster and more points than the third cluster. The third cluster consists of the least number of points as it corresponds to the input space for which minimum error was achieved. Intuitively the size of the original cluster will vary according to the quality of the approximation of the respective response quantity. Thus, the initial size of the first cluster of \(f_{z}^{4\Omega }\) and \(m_{z}^{4\Omega }\) is not necessarily identical.

  • Next, the number of representative points of each cluster are obtained by the k-means clustering technique, implemented using the built-in MATLAB function ’kmeans’. For each of the response quantities, the first, second and third clusters consist of 10, 6 and 4 points, respectively. Thus, the input design space of cluster 1 is represented by more points (fine partitioning) so as to capture the response trends precisely in the erroneous regions compared to the second and third clusters. These 20 additional input points generated for each of the four response quantities are shown in Fig. 6.

Fig. 6
figure 6

Adaptive input sample generation with predictive variance feature of GP and k-means clustering

Responses from the actual FE model are calculated for the above 80 additionally generated input points. However, 78 responses could be obtained as \(m_{z}^{4\Omega }\) did not converge for two points. In order to check the improvement in the GP predictions of the four response quantities, 10-fold CV is performed on (96+20) samples of \(f_{z}^{4\Omega }\), J, and damping and (96+18) samples of \(m_{z}^{4\Omega }\). The RMSE values obtained for \(f_{z}^{4\Omega }\) improve from 0.00161 to 0.00107, \(m_{z}^{4\Omega }\) improves from 0.00109 to 0.00071, J improves from 9.13 \(\times 10^{-6}\) to 8.51 \(\times 10^{-6}\) and damping improves from 0.01279 to 0.00924.

For testing the performance of all the ML techniques trained on the initial 96 samples, the above 78 additionally generated points are used. This can be viewed as a type of concise MCS as all of the above 78 points resulted from applying k-means clustering on 10,000 MCS samples (as discussed previously). Moreover, these 78 points correspond to the input design space where the GP prediction was erroneous, thus, it is expected that all the ML techniques will be put to severe test upon subjecting them to this test dataset.

4.3 Convolution neural network: implementation details

In this study, the FCN neural network adopted is (see Fig. 2) a multi-task learning approach as depicted in Fig. 3. For its implementation, the Python-based Tensorflow software has been used [1]. In our model, the shared convolutional block had a CNN and Max pooling layer, followed by a fully connected (i.e., dense) layer and the regression layer. The CNN had 128 filters, with a kernel of length \((1\times 2)\). A pooling layer of size \((1\times 2)\) is applied after the CNN layer, after which a flatten layer is applied to transform the extracted features from the CNN and pooling layers to a one-dimensional vector. The fully connected (dense) layer is applied after the flatten layer to perform representation learning between the one-dimensional vector and labels. Finally, the regression layer is applied with a linear activation function to learn to make the inferences. In order to make the input data compatible with the CNN, it must be transformed to a manner that can be accepted as input. For CNNs, two-dimensional data inputs (i.e., having \(n_{rows} \times n_{columns}\)) must be transformed to 3-dimensional tensors, corresponding to (\(n_{timesteps} \times n_{rows} \times n_{columns})\). Therefore, for this current study, each data point in the original input with dimension \((1 \times 14)\) (i.e., number of responses) is reshaped to a 3-dimensional image corresponding to \((1 \times 1 \times 14)\), treating it as a single instance of an image comprising \((1 \times rows \times columns)\). During model training, input data with three variables are used to train the model in a shared training approach, which is multi-task learning, in such a manner that the predicted responses are all learnt in a single training regime.

To optimise the parameters within a model, stochastic gradient-based optimisation algorithms are generally used. For this study, the Adam optimizer was adopted. The learning rate value was determined as \((1\times 10^{-6})\) using a grid search mechanism. This study adopted a loss function based on the RMSE. Therefore, the RMSE was calculated on the training data to update the model parameters with each iteration (epoch).

The mini-batch stochastic gradient descent was applied using the Adam optimiser to minimise the RMSE. The performance of deep neural networks depends on predetermined hyperparameters, which are obtained using an optimization process. Unlike model parameters, which are learned using an optimization function to minimise an objective (or loss) function, hyperparameters are not learned during the model training. Many hyperparameter optimization methods exist, such as random search, grid search, and Bayesian optimization. However, for this article, we applied a grid search framework for hyperparameter optimization of all the machine learning models adopted [?].For this study, the hyperparameter optimization method can be described in the following manner: Consider a dataset \(\Phi\), with an index of n possible hyperparameters h. The grid search method simply requires the selection of a set of values for each hyperparameter \((h_1 \dots h_k)\) that minimizes the validation loss. In other words, the grid search algorithm executes all the possible combinations of values in a ’grid’ format, such that the number of trials in a grid search is \(S=\prod ^n_{n-1}|h^{(k)}|\).

The information on the trainable parameters is provided as follows. First layer is the input layer, so the input is a \(1 \times 3\) tensor. In the second layer (i.e., first convolution layer), the input to the layer is the output from layer 1 and since the filter size for convolution layer 1 is \((1 \times 2)\), the number of parameters in this layer is \((((1 \times n_{input} \times filter_{size})+ bias_{parameter}) \times n_{filter})\), which is \((1 \times 3 \times 2)+ 1) \times 128) = 896\). For the dense (fully-connected) layers, since each layer has 32 units, the number of trainable parameters for each layer is calculated as \((1 \times n_{inputfromCNN})+ bias_{parameter}) \times n_{units})\), which is \((((1 \times 128)+ 1) \times 32)\), which is 4,128.

As it is evident from the above calculations that the number of trainable parameters are significantly high and may lead to overfitting in the model. The dropout technique was applied to control the overfitting in the model. Specifically, a dropout of 0.2 (20 %) was applied in training the deep learning model. Also, some part of the training data (10 %) was allocated for model validation, using the shuffle method (i.e., randomly shuffling the training dataset).

4.4 Multi layer perceptron: implementation details

In this study, we adopted an MLP, which had a shared neural network block, and one hidden layer of densely connected neurons, made of 32 units in Tensorflow [1]. The network adopted the Adam optimiser, and a learning rate of \((1\times 10^{-3})\). Just as with the CNN, the loss function adopted was the RMSE. The model was trained for 1,000 epochs. Similar to the deep CNN model training described in Sect. 4.3, the MLP was trained using similar parameters for the optimiser. Consequently, the model training regime was run for 300 epochs with a batch size of 16 and learning rate, \(\alpha =1 \times 10^{-3}\). The first-moment exponential decay was \(\beta _1=0.001\), while the second-moment exponential decay was set as \(\beta _2=0.999\).

The number of parameters in each MLP layer is calculated using the formula \((n_{units} \times n_{features})\), which is \((32 \times 2) = 64\). Note that \(n_{features}\) refers to the 2 connections among the 3 inputs. For the second (fully connected) block, each layer, fully connected to a response variable has \((n_{units}+bias_{parameter})\) parameters, which is \((32+1)=33\) parameters. The dropout scheme adopted for the MLP model was the same as that of the CNN model to limit the overfitting.

4.5 Random forest: implementation details

The random forest model in Tensorflow [1] was trained using an input dataset and 10-fold CV for model parameter tuning to ensure generalisation. For the specific model adopted in this study, we selected to train a fixed number of trees in the forest. For this, the number of trees was set as 100 and given that – as earlier stated – the random forest is an ensemble method that is trained by creating multiple decision trees, the number of trees parameter is used to specify the number of trees to be used in the process. In this study, given that the total number of features \(m=3\) is relatively small, the study adopted a bagging (bootstrap aggregation) method of training the algorithm. To train our model, we adopted the MSE to be used in measuring the quality of the split, which is equivalent to variance reduction in a feature selection regime.

4.6 Support vector machine: implementation details

As previously stated, the support vector regressor maps the training data into a higher-dimensional feature space, using a function and subsequently computing a hyperplane that maximises the distance between the margins of the target feature. However, the support vector regressor has many parameters that must be set for accurate parameter forecasting. These parameters, which are not optimised with model training, are referred to as hyperparameters. For this study, arriving at an optimal configuration for the hyperparameters was achieved using a grid search framework. For the SVR, the key hyperparameters include the kernel type, the kernel coefficient, the regularization parameter, and the epsilon value \(\epsilon\). Consequently, the selected \(\epsilon\) value for this study was set as 1.0, while the kernel function used was the radial basis function (RBF). The kernel function used was defined as \(\gamma = 1/((n_{features}) \times X_{variance})\), where \(n_{features}\) refers to the number of input features (i.e., 3) and \(X_{variance}\) denotes the variance of these input features. For implementation, Tensorflow software was utilized [1].

Note that for this study, the multitask learning framework is only applied to the deep learning models (CNN and MLP), primarily to reduce the training time required to train a model for each output response. Given that the other shallow learning models trained relatively quickly, it was not very time consuming to loop through the individual output responses in each training cycle.

4.7 Results and discussion

The RMSE obtained from performing 10-fold CV by different ML techniques is presented in Table 2. The lowest RMSE values corresponding to each stochastic response quantity have been indicated in bold and thereby illustrating the best performing ML technique. From the results obtained in Table 2, it can be observed that out of all the ML techniques, GP and CNN are the most accurate. The results obtained by all the ML techniques on the test dataset are presented in terms of boxplots in Fig. 7 and RMSE values in Table 3. Figure 7 and Table 3 reveal that in addition to GP and CNN, MLP also achieves a satisfactory level of accuracy. The response statistics (mean and standard deviation) of the stochastic response quantities are reported in Table 4.

It can be observed from Table 4 that the standard deviation is high for the first torsion frequency, the second flap frequency and the second lag frequency. The first lag and flap frequencies show a low effect of the elastic stiffness uncertainty due to their strong dependence on the rotation speed. Vibration levels can increase substantially when the rotor frequencies approach multiples of the main rotor speed. Regions for the safe operation of the main rotor in terms of RPM are selected by carefully avoiding the reasons where rotating frequencies approach the multiples of the rotating speeds. The results in this paper show that an uncertainty analysis must be conducted to ensure that material uncertainty does not cause frequency shifts which can result in high vibration levels.

As can be expected, the effect of uncertainty of the stiffnesses on the rotor power is much less, as this is a higher order effect. The six vibratory loads consists of three vibratory forces and moments acting on the rotor hub. Vibratory hub loads transmitted by the main rotor to the fuselage is the main cause of vibration. The three vibratory forces are the longitudinal, lateral and vertical forces and are indicated by subscripts x, y and z, respectively. The three vibratory moments are the rolling, pitching and yawing moment and are indicated by subscripts x, y and z, respectively. The substantial effect of uncertainty can be clearly observed on the six vibratory hub loads. In particular, a high impact of uncertainty relative to the mean is seen in the yawing hub moment. The cumulative effect of uncertainty on the helicopter is shown in J, and again it can be seen to be quite substantial relative to the mean. Considerable effect of uncertainty is also shown in the damping. Damping in the modes for the periodic system is indicative of the possibility of the aeroelastic instability known as flutter. Typically, flutter occurs when damping becomes negative and this is a self excited oscillation which can cause the amplitude of motion of the rotor to increase inexorably until failure. While lag dampers are often used to alleviate damping, the uncertainty results show that sufficient factor of safety must be used in lag damper design to alleviate the effect of perturbation in the damping simulation results due to uncertainty in the material properties.

These results indicate that a robust and reliability design optimization approach is needed for helicopter optimization. The GP, CNN and MLP methods are shown in this paper to be the most suitable for performing uncertainty quantification for such problems. Note that typically, vibration is minimised using the objective function J and constraints are imposed on the blade rotating frequencies and damping. The damping should remain positive and the frequencies should be kept away from multiples of the main rotor speed. Uncertainty can cause a deterministic design to become infeasible. From a practical perspective, uncertainty quantification allows a systematic approach to determine margins of safety which can be used in design for frequencies, vibratory hub loads and aeroelastic damping. The use of uncertainty quantification also prevents the need for overly conservative designs based on high values of factor of safety which can lead to excess weight and the resulting deleterious consequences for a flight vehicle structure.

Table 2 RMSE values obtained from performing 10-fold CV by different ML techniques
Fig. 7
figure 7

Boxplots of the stochastic response quantities corresponding to the test dataset by (a) actual FE-based simulations (b) GP (c) CNN (d) MLP (e) RF (f) SVR

Table 3 RMSE values obtained from approximation of the test dataset by different ML techniques
Table 4 Response statistics corresponding to the test dataset by different ML techniques. The response statistics are reported as mean (+/− standard deviation)

5 Summary and conclusions

The novelty of the work lies in the application of advanced data-driven learning techniques, such as convolution neural networks and multi-layer perceptron, random forests, support vector machines and adaptive Gaussian process and utilizing their multi-layered structure for capturing the nonlinear response trends to develop an efficient grey-box physics-informed ML framework for stochastic rotor analysis. Specifically, this work improves upon the accuracy aspect by metamodelling the nonlinear stochastic rotor response trends by entailing limited expensive-to-generate physics-based simulations from detailed FE models. Thus, the work is of practical significance as (i) it accounts for manufacturing uncertainties, (ii) accurately quantifies their effects on nonlinear response of rotor blade and (iii) makes the otherwise computationally prohibitive simulations viable by the use of ML.

A comparative assessment of advanced deep and shallow supervised learning techniques is presented. These data-driven techniques have been trained to learn from the stochastic aeroelastic response trends and build corresponding physics-based meta-models of the system, thereby eliminating the need to perform high-fidelity simulations on the actual FE model. For simulating the manufacturing variability, the combined effect of material and geometric randomness have been taken into account.

Important findings from the results obtained in this study include:

  • In general, high sensitivity of the rotor aeroelastic output responses to the input elastic stiffness uncertainty reveals that considering manufacturing variability in analyzing helicopter rotors is pivotal to simulate their actual behaviour.

  • To be specific, few response parameters like the first torsion frequency, vibratory hub loads and damping have a substantial effect due to the input perturbations. The highest sensitivity has been observed in the yawing hub moment. This suggests that sufficient factor of safety should be considered in the rotor design to (i) prevent frequency shifts which can result in high vibration levels and, (ii) avoid the occurrence of the aeroelastic instability condition known as flutter and accordingly, design lag dampers which are used to alleviate damping.

  • The results achieved highlight the fact that CNN and GP are the most accurate models followed by MLP. RF and SVM mostly failed to capture the response trends, with a very few exceptions where some response quantities were decently predicted. The accuracy obtained by CNN, GP and MLP is worth acknowledging as (i) a high proportion of variation in the input parameters was considered (ii) the prediction test dataset consisted of the points from the stochastic input space where GP initially proved to be vulnerable. Additionally, an adaptive sampling strategy was devised by using the predictive variance feature of GP (i) to improve the accuracy by adding a nominal number of points to the experimental design and (ii) the additional points generated were used to create the test dataset in which the other models could be validated.

For extending this research in a future direction, it will be worth investigating the effect of uncertainties on different rotor models. This will create additional datasets based on different physical insights on the structural system. Also, this work does not account for spatially varying uncertainties, which may be prevalent for helicopter rotor models. This will require integration of random field models for the stochastic elastic stiffness parameters with the present computational framework and is a potential direction for future investigation. Since a decent level of accuracy is achieved by CNN, GP and MLP, these machine learning models can be extended for applications of optimization under uncertainty of composite rotor blades. Although the deep learning techniques may be hard to train, once the ideal model configuration is achieved, they can easily be employed to solve more expensive problems such as the optimal design of the blades. The capability of these methods in operating in high-dimensional spaces will be advantageous to GP and conventional surrogate modelling approaches which easily tend to collapse in these complex scenarios. Therefore, deep and shallow neural net driven robust or reliability based design of composite helicopter rotor blades for vibration control will be an interesting extension of this present work. One of the approaches to solve the optimization problem can be minimizing the vibration (denoted by the term J) with the constraints imposed on the blade rotating frequencies and damping to ensure the frequencies are kept away from the multiples of the main rotor speed and the damping remains positive.