This chapter is a synthesis of the previous ones, since many introduced concepts are applied herein. The complete ROM-net workflow, described in Sect. 2.4.2 is applied to the quantification of the uncertainty of dual quantities (such as the accumulated plastic strain and the stress tensor) on an real-life turbine blade, generated by the uncertainty of the temperature loading field. The numerical experiments make use of the codes Mordicus and genericROM, introduced respectively in Sects. 4.1 and 4.2. The content of this chapter is inspired from our publication [9].

Computing the fatigue lifetime of one such blade requires simulating its behavior until the stabilization of the mechanical response, which last several weeks using Abaqus [23] because of the size of the mesh, the complexity of the constitutive equations, and the number of loading cycles in the transient regime. With such a computation time, uncertainty quantification with the Monte Carlo method is unaffordable. In addition, such simulations are too time-consuming to be integrated in design iterations, which limits them to the final verification steps, while the design process still relies on simplified models. Accelerating these complex simulations is a key challenge while maintaining a satisfying accuracy, as it would provide useful numerical tools to improve design processes and quantify the effect of the uncertainties on the environment of the system.

Simulations are accelerated using a dictionary of reduced order models, with a classifier able to select which local reduced order model to be used for a new temperature loading. A dataset of 200 solutions is computed in a Finite Element approximation space of dimension in the order of the million, for various instances of the temperature field loading, in parallel in 7 days and 9 h on 48 cores. These solutions are computed over 11 time steps in the first cycle, using a scalable Adaptive MultiPreconditioned FETI (AMPFETI) solver [3] in Z-set finite-element software [17]. The dataset is partitioned into two clusters using a k-medoids algorithm with a ROM-oriented dissimilarity measure in 5 min; the corresponding local ROMs, using POD data compression and ECM operator compression, are trained in 2 h and 30 min. An automatic reduced model recommendation procedure, allowing to decide which local ROM to be used for a new temperature loading, is trained in the form of a logistic regression classifier in 16 min. A meta-model is used to reconstruct the dual quantities of interest over the complete mesh from their values on the reduced integration points, in the form of a multi-task Lasso, which takes 1 h to train for 14 dual fields. The uncertainties on dual quantities of interest, such as the accumulated plastic strain and the stress tensor, are quantified by using our trained ROM-net on 1008 Monte Carlo draws of the temperature loading field in 2 h and 48 min, which corresponds to a speedup greater than 600 with respect to our highly optimized domain decomposition AMPFETI solver. Expected values for the Von Mises stress and the accumulated plastic strain have 0.99-confidence intervals width of respectively 1.66% and 2.84% (relative to the corresponding prediction for the expected value). As a verification stage, 20 reference solutions are computed on new temperature loadings, and dual quantities of interest are predicted with relative accuracy in the order of 1% to 2%, while the location of the maximum value is perfectly predicted.

In what follows, we describe the industrial dataset, the hypotheses of the model, and the objective of the present study. The proposed workflow for uncertainty quantification is then applied on this industrial configuration.

5.1 Industrial Context

The industrial test case of interest consists in predicting the mechanical behavior of a high-pressure (HP) turbine blade in an aircraft engine with uncertainties on the thermal loading. The industrial context and the models for the mechanical behavior and the thermal loading are presented, with a particular emphasis on the assumptions that have been made. For confidentiality reasons, mesh sizes and numerical values corresponding to the industrial dataset are not given, and the provided figures and plots do not contain any color map or physical numerical value. The accuracy of the predictions made by our methodology are given in the form of relative errors.

5.1.1 Thermomechanical Fatigue of High-Pressure Turbine Blades

High-pressure turbine blades are critical parts in an aircraft engine. Located downstream of the combustion chamber, they are subjected to extreme thermomechanical loadings resulting from the combination of centrifugal forces, pressure loads, and hot turbulent fluid flows whose temperatures are higher than the material’s melting point. The repeating thermomechanical loading over time progressively damages the blades and leads to crack initiation under thermomechanical fatigue. Predicting the fatigue lifetime is crucial not only for safety reasons, but also for ecological issues, since reducing fuel consumption and improving the engine’s efficiency requires increasing the temperature of the gases leaving the combustion chamber.

High-pressure turbine blades are made of monocrystalline nickel-based superalloys that have good mechanical properties at high temperatures. To reduce the temperature inside this material, the blades contain cooling channels where flows relatively fresh air coming from the compressor. In addition, the blade’s outer surface is protected by a thin thermal barrier coating. In spite of these advanced cooling technologies, the rotor blades undergo centrifugal forces at high temperatures, causing inelastic strains. Under this cyclic thermomechanical loading repeated over the flights, the structure has a viscoplastic behavior and reaches a viscoplastic stabilized response, where the dissipated energy per cycle still has a nonzero value. This is called plastic shakedown, and leads to low-cycle fatigue. At cruise flight, the persistent centrifugal force applied at high temperature induces progressive (or time-dependent) inelastic deformations: this phenomenon is called creep. In addition, the difference between gas pressures on the extrados and the intrados of the blade generates bending effects. Environmental factors may also locally modify the chemical composition of the material, leading to its oxidation. As oxidized parts are more brittle, they facilitate crack initiation and growth. Thermal fatigue resulting from temperature gradients is another life-limiting factor. Temperature gradients make colder parts of the structure prevent the thermal expansion of hotter parts, creating thermal stresses. Due to their higher temperatures, the hot parts are more viscous and have a lower yield stress, which make them prone to develop inelastic strains in compression. When the temperature cools down after landing, tensile residual stresses appear in parts which were compressed at high temperatures and favor crack nucleation. Given the complex temperature field resulting from the internal cooling channels and the turbulent gas flow, thermal fatigue has a strong influence on the turbine blade’s lifetime. In particular, during transient regimes such as take-off, an important temperature gradient appears between the leading edge and the trailing edge of the blade, since the latter has a low thermal inertia due to its small thickness and thus warms up faster.

In short, the behavior of a high pressure turbine blade results from a complex interaction between low-cycle fatigue, thermal fatigue, creep, and oxidation. Due to the cost and the complexity of experiments on parts of an aircraft engine, numerical simulations play a major role in the design of high-pressure turbine blades and their fatigue lifetime assessment. All this knowledge have been learned by scientist and engineers during last decades. In the proposed approach to machine learning for model order reduction, all this knowledge is preserved in local ROMs. It is even more than that, the uncertainty propagation comes to complete this valuable traditional knowledge. We do not expect from artificial intelligence to learn everything in our modeling process.

5.1.2 Industrial Dataset and Objectives

Figure 5.1 gives the geometry and the finite-element mesh of a real high-pressure turbine blade. The mesh is made of quadratic tetrahedral elements, and contains a number of nodes in the order of the million. The elasto-viscoplastic mechanical behavior is described by a crystal plasticity model presented in the appendix of [9]. As explained above, Monte Carlo simulations using a commercial software as Abaqus are unaffordable. With the help of domain decomposition methods, the computation time can be reduced by solving equilibrium equations in parallel on different subdomains of the geometry. Using the implementation of the Adaptive MultiPreconditioned FETI solver [3] in Z-set finite-element software [17], the simulation of one single loading cycle of the HP turbine blade with 48 subdomains takes approximately 53 min.

Fig. 5.1
A geometrical structure and a mesh structure of a real high-pressure turbine blade.

High-pressure turbine blade geometry and mesh (micro-perforations are not modeled) [9]

The objective is to use a ROM-net to quantify uncertainties on the mechanical behavior of the high-pressure turbine blade, given uncertainties on the thermal loading. The reduction of the computation time should enable Monte Carlo simulations for uncertainty quantification. In particular, we are not interested in predicting the state of the structure after a large number of flight-representative loading cycles. Only one cycle is simulated. Cyclic extrapolation of the behavior of a high-pressure turbine blade has been studied in [4] and is out of the scope of this section.

5.1.3 Modeling Assumptions

It is assumed that the heat produced or dissipated by mechanical phenomena has negligible effects in comparison with thermal conduction, which enables avoiding strongly coupled thermomechanical simulations and running thermal and mechanical simulations separately instead. Under a weak thermomechanical coupling, the first step consists in solving the heat equation to determine the temperature field and its evolution over time. The temperature field history defines the thermal loading and is used to compute thermal strains and temperature-dependent material parameters for the mechanical constitutive laws. Once the thermal loading is known, the temperature-dependent mechanical problem must be solved in order to predict the mechanical response of the structure.

The thermomechanical loading applied to the high-pressure turbine blade during its whole life is modeled as a cyclic loading, with one cycle being equivalent to one flight. The rotation speed of the turbine’s rotor is proportional to a periodic function of time \(\omega (t)\) whose evolution over one period (or cycle, see Fig. 5.2) is representative of one flight with its three main regimes, namely take-off, cruise, and landing. The period (or duration of one cycle) is denoted by \(t_c\). The rotation speed between flights k and \(k+1\) is zero, which means that \(\omega (k t_c) = 0\) for any integer k. The rotation speed \(\omega (t)\) is scaled so that its maximum is 1.

Fig. 5.2
A line graph depicts the function W T versus T. It includes a curve for one period denoted by T c. The rotation speed is proportional to the periodic function of time W T for one period.

Function \(\omega (t)\) defining one cycle for the rotation speed [9]

Let \(\Omega \subset \mathbb {R}^3\) denote the solid body representing the high-pressure turbine blade, with \(\partial \Omega \) denoting its outer surface. Let \(\partial \Omega ^p \subset \partial \Omega \) be the surface corresponding to the intrados and extrados. The thermal loading is defined as:

$$\begin{aligned} \forall {\boldsymbol{\xi }} \in \Omega , \quad \forall t \in \mathbb {R}_{+}, \qquad T({\boldsymbol{\xi }}, t) = (1 - \omega (t))T_{0} + \omega (t) T_{\max }({\boldsymbol{\xi }}), \end{aligned}$$
(5.1)

where \(T_{0} = 293 \ \textrm{K}\) and \(T_{\max }\) is the temperature field obtained when the rotation speed reaches its maximum. This field \(T_{\max }\) is obtained either by an aerothermal simulation or by a stochastic model, as explained later. Similarly, the pressure load applied on \(\partial \Omega ^p\) reads:

$$\begin{aligned} \forall {\boldsymbol{\xi }} \in \partial \Omega ^p , \quad \forall t \in \mathbb {R}_{+}, \qquad p^{\partial \Omega }({\boldsymbol{\xi }}, t) = (1 - \omega (t))p^{\partial \Omega }_{0} + \omega (t) p^{\partial \Omega }_{\max }({\boldsymbol{\xi }}), \end{aligned}$$
(5.2)

where \(p^{\partial \Omega }_{0} = 1 \ \text {atm}\) is the atmospheric pressure at sea level, and where \(p^{\partial \Omega }_{\max }\) is the pressure field obtained when the rotation speed reaches its maximum. The clamping of the blade’s fir-tree foot on the rotor disk is modeled by displacements boundary conditions that are not detailed here.

Small geometric details of the structure have been removed to simplify the geometry. Nonetheless, the main cooling channels are considered. The effects of the thermal barrier coating (TBC) have been integrated in aerothermal simulations, but the TBC is not considered in the mechanical simulation although its damage locally increases the temperature in the nickel-based superalloy and thus affects the fatigue resistance of the structure. Additional centrifugal effects due to the TBC are not taken into account.

The predicted mechanical response of the structure depends on many different factors. Below is a nonexhaustive list of influential factors that are possible sources of uncertainties in the numerical simulation:

  • Thermal loading: The viscoplastic behavior of the nickel-based superalloy is very sensitive to the temperature field and its gradients. However, the temperature field is not accurately known because of the impossibility of validating numerical predictions experimentally. Indeed, temperature-sensitive paints are accurate to within \(50 \ \textrm{K}\) only, and they do not capture a real surface temperature field since they measure the maximum temperature reached locally during the experiment.

  • Crystal orientation: Because of the complexity of the manufacturing process of monocrystalline blades, the orientation of the crystal is not perfectly controlled. As the superalloy has anisotropic mechanical properties, defaults in crystal orientation highly affect the location of damaged zones in the structure.

  • Mechanical loading: The centrifugal forces are well known because they are related to the rotation speed that is easy to measure. On the contrary, pressure loads are uncertain because of the turbulent nature of the incoming fluid flow. However, the effects of pressure loads uncertainties on the mechanical response are less significant than those of the thermal loading and crystal orientation uncertainties.

  • Constitutive laws: Uncertainties on the choice of the constitutive model, the relevance of the modeling assumptions, and the values of the calibrated parameters involved in the constitutive equations also influence the results of the numerical simulations.

For simplification purposes, the only source of uncertainty that is considered in this work is the thermal loading. The equations of the mechanical problem are then seen as parametrized equations, where the parameter is the temperature field \(T_{\max }\) (see Eq. (5.1)) obtained when the rotation speed reaches its maximum value. The dimension of the parameter space is then the number of nodes in the finite-element mesh. The mechanical loading is assumed to be deterministic. With the crystal orientation, the constitutive laws and their parameters (or coefficients), they are considered as known data describing the context of the study and given by experts. For details on the constitutive law model, we refer to the appendix of [9].

5.1.4 Stochastic Model for the Thermal Loading

A stochastic model is required to take into account the uncertainties on the thermal loading. Given the definition of the thermal loading in Eq. (5.1), we only need to model uncertainties in space through the field \(T_{\max }\) obtained when the rotation speed reaches its maximum value. The random temperature fields must satisfy some constraints: they must satisfy the heat equation, and they must not take values out of the interval \([ 0 \ K ; T_{\text {melt}} ]\), where \(T_{\text {melt}}\) is the melting point of the superalloy. These random fields are obtained by adding random fluctuations to a reference temperature field, see Fig. 5.3. The reference field and comes from aerothermal simulations run with the software Ansys Fluent.Footnote 1 The data-generating distribution is defined as a Gaussian mixture model made of two Gaussian distributions with the same covariance function but with distinct means, and with a prior probability of 0.5 for each Gaussian distribution. The Gaussian distributions are obtained by taking the four first eigenfunctions of the covariance function (see Karhunen-Loève expansion [16]), with a standard deviation of \(15 \ \textrm{K}\). Therefore, realizations of the random temperature field read:

$$\begin{aligned} \forall {\boldsymbol{\xi }} \in \Omega , \quad T({\boldsymbol{\xi }}) = T_{\text {ref}}({\boldsymbol{\xi }}) + \Upsilon _{0} \ \delta T_0 ({\boldsymbol{\xi }}) + \sum _{i=1}^{4} \Upsilon _{i} \ \delta T_i ({\boldsymbol{\xi }}), \end{aligned}$$
(5.3)

where \(T_{\text {ref}}\) is the reference field, \(\delta T_0 \) is a temperature perturbation at the trailing edge whose maximum value is \(50 \ \textrm{K}\), \(\{ \delta T_i \}_{1 \le i \le 4}\) are fluctuation modes, \(\Upsilon _{0}\) is a random variable following the Bernoulli distribution with parameter 0.5, and \(\{ \Upsilon _i \}_{1 \le i \le 4}\) are independent and identically distributed random variables following the standard normal distribution \(\mathcal {N}(0,1)\). The variable \(\Upsilon _{0}\) is also independent of the other variables \(\Upsilon _i\). The different fields involved in Eq. (5.3) can be visualized in Fig. 5.3. Equation (5.3) defines a mixture distribution with two Gaussian distributions whose means are \(T_{\text {ref}}\) and \(T_{\text {ref}} + \delta T_0\). We voluntarily define this mixture distribution with \(\delta T_0\) adding \(50 \ \textrm{K}\) in a critical zone of the turbine blade in order to check that our cluster analysis can successfully detect two relevant clusters, i.e., one for fields obtained with \(\Upsilon _{0}(\theta ) = 0\) and one for fields obtained with \(\Upsilon _{0}(\theta ) = 1\). Indeed, the temperature perturbation \(\delta T_0\) is expected to significantly modify the mechanical response of the high-pressure turbine blade. All the fields \(\{ \delta T_i \}_{0 \le i \le 4}\) satisfy the steady heat equation like \(T_{\text {ref}}\), which ensures that the random fields always satisfy the heat equation under the assumption of a linear thermal behavior. For nonlinear thermal behaviors, Eq. (5.3) would define surface temperature fields that would be used as Dirichlet boundary conditions for the computation of bulk temperature fields. The assumption of a linear thermal behavior is adopted here to avoid solving the heat equation for every realization of the random temperature field.

Let us now give more details about the construction of the fluctuation modes \(\{ \delta T_i \}_{1 \le i \le 4}\). First, surface fluctuation modes are computed on the boundary \(\partial \Omega \) using the method given in [21] for the construction of random fields on a curved surface. The correlation function is defined as a function of the geodesic distance \(d_G\) along the surface \(\partial \Omega \):

$$\begin{aligned} \rho ({\boldsymbol{\xi }},{\boldsymbol{\xi }}') = \exp \left( -\frac{d_{G}({\boldsymbol{\xi }},{\boldsymbol{\xi }}')}{d_{G}^0} \right) , \end{aligned}$$
(5.4)

where \(d_{G}^0\) is a correlation length. Geodesic distances are computed using the algorithm described in [18, 22] and implemented in the Python library gdist.Footnote 2 A covariance matrix is built by evaluating the correlation function on pairs of nodes of the outer surface of the finite-element mesh, and multiplying the correlation by the constant variance. The four surface modes are then obtained by finding the four eigenvectors corresponding to the largest eigenvalues of the covariance matrix. The steady heat equation with Dirichlet boundary conditions is solved for each of these surface modes to derive the 3D fluctuation modes, using Z-set [17] finite-element solver. The Python library BasicToolsFootnote 3 developed by SafranTech is used to read the finite-element mesh and write the temperature fields in a format that can be used for simulations on Z-set.

Fig. 5.3
A set of six reference temperature fields.

Reference temperature field (on the left), temperature perturbation at the trailing edge (field 0 = \(\delta T_0\)), and fluctuation modes (fields 1 to 4). The fluctuations in the fourth mode are located inside the blade, in the cooling channels [9]

5.2 ROM-net Based Uncertainty Quantification Applied to an Industrial High-Pressure Turbine Blade

This section develops the different stages of the ROM-net for the industrial test case presented in the previous section. Given our budget of 200 high-fidelity simulations, a dictionary containing two local ROMs is constructed using our clustering procedure. A logistic regression classifier is trained for automatic model recommendation using information identified by feature selection, followed by an alternative to the Gappy-POD for full-field reconstruction of dual quantities. Then, the results of the uncertainty quantification procedure are presented. Finally the accuracy of the ROM-net is validated using simulations for new temperature loadings.

5.2.1 Design of Numerical Experiments

Given the computational cost of high-fidelity mechanical simulations of the high-pressure turbine blade, the training data are sampled from the stochastic model for the thermal loading using a design of experiments. Our computational budget corresponds to 200 high-fidelity simulations, so a database of 200 temperature fields must be built. This database includes two separate datasets coming from two independent DoEs:

  • The first dataset is built from a Maximum Projection LHS design (MaxProj LHS DoE [13]) and contains 80 points. This dataset will be used for the construction of the dictionary of local ROMs via clustering. The MaxProj LHS DoE has good space-filling properties on projections onto subspaces of any dimension.

  • The second dataset is built from a Sobol’ sequence (Sobol’ DoE) of 120 points. Using a suboptimal DoE method ensures that this second dataset is different and independent from the first one. The lower quality of this dataset with respect to the first one is compensated by its larger population. This dataset will be used for learning tasks requiring more training examples than the construction of the local ROMs, namely the classification task for automatic model recommendation, and the training of cluster-specific surrogate models for the reconstruction of full fields from hyper-reduced predictions on a reduced-integration domain. These surrogate models (Gappy surrogates) replace the Gappy-POD [11] method that is commonly used in hyper-reduced simulations to retrieve dual variables on the whole mesh.

Fig. 5.4
A set of graphical illustrations depict the results of Maxproj L H S D o E. It is labeled X 1, X 2, X 3, X 4, and X 5. The distributions are estimated in a diagonal format. The values of the correlations are provided for each distribution.

Visualization of the MaxProj LHS DoE. The marginal distributions are represented on the diagonal. The 5D DoE is projected on 2D subspaces for visualization purposes, in order to check space-filling properties in 2D [9]

Fig. 5.5
A set of 10 scatterplots and 5 area graphs depict the results of Sobol D o E. It is labeled X 1, X 2, X 3, X 4, and X 5. The distributions are estimated in a diagonal format. The values of the correlations are provided for each distribution.

Visualization of the Sobol’ DoE. The marginal distributions are represented on the diagonal. The 5D DoE is projected on 2D subspaces for visualization purposes, in order to check space-filling properties in 2D [9]

These DoEs are built with the platform Lagun.Footnote 4 The fact that these two datasets come from two separate DoEs is beneficial: as each of them is supposed to have good space-filling properties, they are both representative of the possible thermal loading and can therefore be used to define a training set and a test set for a given learning task. For instance, the classifier trained on the Sobol’ DoE can be tested on the MaxProj LHS DoE. The local ROMs built from snapshots belonging to the MaxProj LHS DoE can make predictions on the Sobol’ DoE that will be used for the training of the Gappy surrogates, which is relevant since the Gappy surrogates are supposed to analyze ROM predictions on new unseen data in the exploitation phase.

Drawing random temperature fields as defined in Eq. (5.3) requires sampling data from the random variables \(\{ \Upsilon _i \}_{0\le i \le 4}\), where \(\Upsilon _0\) follows the Bernoulli distribution with parameter 0.5 and the variables \(\Upsilon _i\) for \(i \in [\![ 1;4 ]\!]\) are independent standard normal variables and independent of \(\Upsilon _0\). Both DoE methods (Maximum Projection LHS and Sobol’ sequence) generate point clouds with a uniform distribution in the unit hypercube. Figures 5.4 and 5.5 show the projections onto 2-dimensional subspaces of the 5D point clouds used to build our datasets. The marginal distributions are plotted to check that they well approximate the uniform distribution. These point clouds, considered as samples of a random vector \((\chi _0, \chi _1, \chi _2, \chi _3, \chi _4)\) following the uniform distribution on the unit hypercube, are transformed into realizations of the random vector \((\Upsilon _0, \Upsilon _1, \Upsilon _2, \Upsilon _3, \Upsilon _4)\) using the following transformations:

$$\begin{aligned} \Upsilon _0 = \mathbbm {1}_{\chi _0 > 1/2} \qquad \text {and} \quad \forall i \in [\![1;4]\!], \quad \Upsilon _i = F^{-1}(\chi _i), \end{aligned}$$
(5.5)

where \(F^{-1}\) is the inverse of the cumulative distribution function of the standard normal distribution. The resulting samples define the MaxProj dataset and the Sobol’ dataset of random temperature fields, using Eq. (5.3). Each temperature field defines a thermal loading, using Eq. (5.1). The 200 corresponding mechanical problems are solved for one loading cycle with the finite-element software Z-set [17] with the domain decomposition method described in [3], with 48 subdomains. The average computation time for one simulation is 53 min.

5.2.2 ROM Dictionary Construction

The 80 simulations associated to the MaxProj dataset are used as clustering data. Loading all the simulation data and computing the pairwise ROM-oriented dissimilarities takes about 5 min. The ROM-oriented dissimilarity defined in [7, Definition 3.11] is computed with \(n=1\), i.e., each simulation is represented by one field. The dataset is partitioned into two clusters using our implementation of PAM [14, 15] k-medoids algorithm, with 10 different random initializations for the medoids. The clustering results can be visualized using Multidimensional Scaling (MDS) [2]. MDS is an information visualization method which consists in finding a low-dimensional dataset \({\boldsymbol{Z}}_{0}\) whose matrix of Euclidean distances \({\boldsymbol{d}}({\boldsymbol{Z}}_{0})\) is an approximation of the true dissimilarity matrix \({\boldsymbol{\delta }}\). To that end, a cost function called stress function is minimized with respect to \({\boldsymbol{Z}}\):

$$\begin{aligned} {\boldsymbol{Z}}_{0} = \underset{{\boldsymbol{Z}}}{\text {arg min}} \left( \varsigma ({\boldsymbol{Z}};{\boldsymbol{\delta }}) \right) = \underset{{\boldsymbol{Z}}}{\text {arg min}}\left( \sum _{i<j}(\delta _{ij}-d_{ij}({\boldsymbol{Z}}))^{2} \right) . \end{aligned}$$
(5.6)

This minimization problem is solved with the algorithm Scaling by MAjorizing a COmplicated Function (SMACOF, [10]) implemented in Scikit-Learn [19]. Figure 5.6 show the clusters on the MDS representations with the goal-oriented variants of the ROM-oriented dissimilarity measure, applied on the accumulated plastic strain \(p_{\text {cum}}^{o}\). The clustering results is compared with the expected clusters corresponding to \(\Upsilon _0 = 0\) and \(\Upsilon _0 = 1\), the latter corresponds to the perturbation \(\delta T_0\) being activated. The obtained clusters almost correspond to the expected ones, with only 4 points with wrong labels out of 80, which quantifies the ability of the ROM-oriented dissimilarity measure on the accumulated plasticity to infer the correct value of \(\Upsilon _0\). The medoids of the two clusters are given in Fig. 5.7. Cluster 0 contains temperature fields for which \(\Upsilon _0 = 1\), while cluster 1 contains fields for which \(\Upsilon _0 = 0\). It can be observed that the quantity of interest clearly differs from one cluster to the other, while the differences are hardly visible on the displacement field. The displacement field combines deformations associated to different phenomena (thermal expansion, elastic strains, viscoplastic strains) that are not necessarily related to damage in the structure, which could explain why the quantity of interest \(p_{\text {cum}}^{o}\) seems to be more appropriate for clustering in this example.

Fig. 5.6
A set of two scatterplots of cluster medoids. The shades in the two plots denote the expected clusters and clusters identified by the algorithm. It includes the positions 0 and 1 of the cluster medoids.

MDS representation of the clustering results using the ROM-oriented dissimilarity measure on the quantity of interest \(p_{\text {cum}}^{o}\) (goal-oriented variant). On the left, the colors correspond to the expected clusters. On the right, the colors correspond to the clusters identified by the clustering algorithm. The positions of the labels 0 and 1 coincide with the positions of the clusters’ medoids. The MDS relative error \(\varsigma ({\boldsymbol{Z}}_{0};{\boldsymbol{\delta }})/\varsigma ({\boldsymbol{0}};{\boldsymbol{\delta }})\) is \(12\%\) [9]

Fig. 5.7
A set of six medoids of two clusters. The reference temperature field and temperature field vary for all the medoids.

The 3 fields on the left correspond to the medoid of cluster 0, and those on the right correspond to the medoid of cluster 1. The fields in the first and the third columns show the differences between the medoids’ temperature fields and the reference temperature field \(T_{\text {ref}}\) (the scale is truncated for the first field). The second and the fourth columns show the displacement magnitude field \(\sqrt{{\boldsymbol{u}}.{\boldsymbol{u}}}\) (top) and the quantity of interest \(p_{\text {cum}}^{o}\) (bottom) [9]

The simulations used for the clustering procedure can directly provide snapshots for the construction of the local ROMs. To control the duration of their training, only 20 simulations are selected to provide snapshots for the each local ROMs, which represents half of the clusters’ populations. These simulations are selected in a maximin greedy approach starting from the medoid (see [8, Algorithm 2, Stage 2] for a example of maximin selection). Figure 5.8 shows which simulations have been selected for the construction of the local ROMs.

Fig. 5.8
A scatterplot of Z 2 versus Z 1 depicts the simulation results for the selected construction of ROMs. The clusters are provided in two different shades. One denotes the point 0 and the other point 1.

MDS representation of the clustering results. Orange points represent the snapshots selected for cluster 0, while the light blue points represent the snapshots selected for cluster 1. For each cluster, the snapshots are selected by a maximin procedure starting from the medoid [9]

The local ROMs are built following the methodology described in Sect. 2.3. The snapshot-POD and the ECM are done in parallel with shared memory on 24 cores. The tolerance for the snapshot-POD is set to \(10^{-8}\) for the displacement field, and to \(10^{-4}\) for dual variables (the quantity of interest \(p_{\text {cum}}^{o}\) and the six components of the stress tensor). The POD bases for the dual variables will be used for their reconstruction with the Gappy surrogates. The tolerance for the ECM is set to \(5 \times 10^{-4}\). Locals ROMs each contain 18 displacement modes and between 8 and 13 modes for stress components. The first and second local ROMs contain respectively 10 and 12 modes for the quantity of interest \(p_{\text {cum}}^{o}\). The ECM selects 506 (resp. 510) integration points for the reduced-integration domain of ROM 0 (resp. 1). Building one local ROM takes approximately 2 h and 30 min.

5.2.3 Automatic Model Recommendation

In this section, a classifier is trained for the automatic model recommendation task. The 120 temperature fields coming from the Sobol’ dataset are used as training data for the classifier. Their labels are determined by finding their closest medoid in terms of the ROM-oriented dissimilarity measure. Hence, for each temperature field of the Sobol’ dataset, two dissimilarities are computed: one with the medoid of the first cluster, and one with the medoid of the second cluster. Once trained, the classifier can be evaluated on the 80 labelled temperature fields of the MaxProj dataset.

Fig. 5.9
A scatterplot depicts the correlation between the redundancy of mutual information terms and the distance between the nodes. The curve begins from ( 0, 2.5) and declines to (70, 0.0)

Feature selection results. The kriging metamodel for redundancy terms is represented by the red curve and built from 800 true redundancy terms (blue points). The elements containing the selected nodes are represented in the turbine blade geometry [9]

Each temperature field is discretized on the finite-element mesh, which contains in the order of the million nodes. To reduce the dimension of the input space and facilitate the training phase of the classifier, we apply the geostatistical mRMR feature selection algorithm described in [8, Algorithm 1] on data from the Sobol’ dataset. First, 800 pairs of nodes are selected in the mesh, which takes 18 s. The 800 corresponding redundancy terms are computed with Scikit-Learn [19] in less than 3 s. Figure 5.9 plots the values of these redundancy terms versus the Euclidean distance between the nodes. We observe that the correlation between the redundancy mutual information terms and the distance between the nodes is poor, with a lot of noise. This can be due to the fact that the random temperature fields have been built using Gaussian random fields on the outer surface with an isotropic correlation function depending on the geodesic distance along the surface rather than the Euclidean distance. Since the turbine blade is a relatively thin structure, two nodes, one on the intrados and another one on the extrados, can be close to each other in the Euclidean distance, but with totally uncorrelated temperature fluctuations because of the large geodesic distance separating them. On the contrary, two points on the same side of the turbine blade can have correlated temperature variations while being separated by a Euclidean distance in the order of the blade’s thickness. The length of the mutual information’s high-variance regime seems to correspond to the blade’s chord, which supports this explanation. The thinness of the turbine blade induces anisotropy in the correlation function of the bulk Gaussian random field defining the thermal loading, which implies an anisotropic behavior of the mutual information according to [8, Property 1]. The use of a local temperature perturbation \(\delta T_0\) in conjunction with fluctuation modes having larger length scales may also partially explain the large variance of redundancy terms. Nonetheless, it remains clear that redundancy terms are smaller as for large distances. This trend is captured by a kriging metamodel (Gaussian process regression) trained with Scikit-Learn in a few seconds, with a sum-kernel involving the Matérn kernel with parameter 5/2 (to get a continuous and twice differentiable metamodel) and length scale 1, and a white kernel to estimate the noise level of the signal. The curve of the metamodel is given in Fig. 5.9. Then, for each node of the finite-element mesh, the mutual information with the label variable is computed. The computations of these relevance terms (in the order of the million terms) are distributed between 280 cores, which gives a total computation time of 15 min. Among these features, 5, 986 features are preselected by discarding those with a relevance mutual information lower than 0.05. The geostatistical mRMR selects 11 features in 42 s. The corresponding nodes in the finite-element mesh can be visualized in Fig. 5.9.

Remark 5.1

The metamodel for redundancy terms could be improved by defining it as a function of the precomputed geodesic distances along the outer surface rather than the Euclidean distances. Each finite-element node would be associated to its nearest neighbor on the outer surface before computing the approximate mutual information from geodesic distances.

The classifier is trained on the Sobol’ dataset, using the values of the temperature fields at the 11 nodes identified by the feature selection algorithm. The classifier is a logistic regression [1, 5, 6] with elastic net regularization [24] implemented in Scikit-Learn. The two hyperparameters involved in the elastic net regularization are calibrated using 5-fold cross-validation, giving a value of 0.001 for the inverse of the regularization strength, and 0.4 for the weight of the \(L^1\) penalty term (and thus 0.6 for the \(L^2\) penalty term). Due to the \(L^1\) penalty term, the classifier only uses 5 features among the 11 input features. The classifier’s accuracy, evaluated on the MaxProj dataset to use new unseen data, reaches \(98.75 \%\). The confusion matrix indicates that \(100 \%\) of the test examples belonging to class 0 have been correctly labeled, and that \(2.38\%\) of the test examples belonging to class 1 have been misclassified. Table 5.1 summarizes the values of precision, recall and F1-score on test data.

Table 5.1 Classification results

5.2.4 Surrogate Model for Gappy Reconstruction

When using hyper-reduction, the ROM calls the constitutive equations solver only at the integration points belonging to the reduced-integration domain. It is recalled that the ECM selected 506 (resp. 510) integration points for the reduced-integration domain of ROM 0 (resp. 1), and that the finite-element mesh initially contains a number of integration points in the order of the million. Therefore, after a reduced simulation, dual variables defined at integration points are known only at integration points of the reduced-integration domain. To retrieve the full field, the Gappy-POD [11] finds the coefficients in the POD basis that minimize the squared error between the reconstructed field and the ROM predictions on the reduced-integration domain. This minimization problem defines the POD coefficients as a linear function of the predicted values on the reduced-integration domain. Although these coefficients are optimal in the least squares sense, they can be biased by the errors made by the ROM. To alleviate this problem, we propose to replace the common Gappy-POD procedure by a metamodel or Gappy surrogate. The inputs and the outputs of the Gappy surrogate are the same as for the Gappy-POD: the input is a vector containing the values of a dual variable on the reduced-integration domain, and the output is a vector containing the optimal coefficients in the POD basis. One Gappy surrogate must be built for each dual variable of interest: in our case, 7 surrogate models per cluster are required, namely one for the quantity of interest \(p_{\text {cum}}^{o}\) and one for every component of the Cauchy stress tensor.

The training data for these Gappy surrogates are obtained by running reduced simulations with the local ROMs, using the thermal loadings of the Sobol’ dataset. Indeed, the two local ROMs have been built on the MaxProj dataset, therefore thermal loadings of the Sobol’ dataset can play the role of test data for the ROMs. For each thermal loading in the Sobol’ dataset, the true high-fidelity solution is already known since it has been computed to provide training data for the classifier. In addition, the exact labels for these thermal loadings are known, which means that we know which local ROM to choose for each thermal loading of the Sobol’ dataset. Given ROM predictions on the reduced-integration domain, the optimal coefficients in the POD basis are given by the projections of the true prediction made by the high-fidelity model (the finite-element model) onto the POD modes. This provides the true outputs for the Sobol’ dataset, which can then be used as a training set for the Gappy surrogates.

Given the high-dimensionality of the input data (there are more than 500 integration points in the reduced-integration domains) with respect to the number of training examples (120 examples), a multi-task Lasso metamodel is used. The hyperparameter controlling the regularization strength is optimized by 5-fold cross-validation. Training the 14 Gappy surrogates (7 for each cluster) takes 1 h. The Gappy surrogates select between \(8\%\) and \(18\%\) of the integration points in the reduced-integration domains, due to the \(L^1\) regularization. The mean cross-validated coefficients of determination are 0.9637 (resp. 0.8935) for the quantity of interest for cluster 0 (resp. cluster 1), and range from 0.9404 to 0.9938 for stress components. These satisfying results mean that it is not required to train a kriging metamodel with the variables selected by Lasso to get nonlinear Gappy surrogates. The Gappy surrogates are then linear, just as the Gappy-POD.

The accuracy gains provided by the Gappy surrogates with respect to classical Gappy-POD on the present industrial case is investigated in Fig. 5.10. Here, 24 high-fidelity simulation in the first cluster are computed (with \(\Upsilon _{0}=0\)) as reference, and Gappy surrogates (using two meta models Lasso and ElasticNet) and classical Gappy-POD are computed using ROM 0. For both variants of meta models and both quantities of interest: accumulated plastic strain and the component 33 of the stress tensor, Gappy surrogates provides more accurate predictions than the classical Gappy-POD.

Fig. 5.10
A set of two scatterplots of Y versus X. They include three plots. Classical Gappy P O D, Gappy surrogate Lasso, and Gappy surrogate elastic net. 1. The plots have similar trends.

Mean over the complete mesh of dual quantities of interest: accumulated plastic strain \({p}_{\text {cum}}^o\) (left) and component 33 of the stress tensor \({\sigma }_{33}\) (right), plotted as points where the x-coordinate is the reference value, and the y-coordinate is the considered reduced prediction [9]

Remark 5.2

In this strategy, the local ROMs solve the equations of the mechanical problem, which enables using linear surrogate models to reconstruct dual variables. Using surrogate models from scratch instead of local ROMs would have been more difficult, given the nonlinearities of this mechanical problem and the lack of training data for regression. In addition, such surrogate models would require a parametrization of the input temperature fields, whereas the local ROMs use the exact values of the temperature fields on the RID without assuming any model for the thermal loading.

The dictionary-based ROM-net used for mechanical simulations of the high-pressure turbine blade is made of a dictionary of two local hyper-reduced order models and a logistic regression classifier. The classifier analyzes the values of the input temperature field at 11 nodes only, identified by our feature selection strategy. For a given thermal loading in the exploitation phase, after the reduced simulation with the local ROM recommended by the classifier, linear cluster-specific Gappy surrogates reconstruct the full dual fields (quantity of interest and stress components) from their predicted values on the reduced-integration domain.

5.2.5 Uncertainty Quantification Results

Once trained, the ROM-net can be applied for the quantification of uncertainties on the mechanical behavior of the HP turbine blade resulting from the uncertainties on the thermal loading. Since the ROM-net online operations can be performed sequentially on one single core, 24 cores are used in order to compute the solution for 24 thermal loadings at once. This way, 42 batches of 24 Monte Carlo simulations are run in 2 h and 48 min. The 1008 thermal loadings used for this study are generated by randomly sampling points from the uniform distribution on the 5D unit hypercube and applying the transformation given in Eq. (5.5).

Fig. 5.11
Two histograms depict the estimated distributions of the probability density functions. 1. The curve increases to a high level and suddenly falls. 2. The curve increases and falls to a range and again rises and has a great fall.

Histograms and probability density functions of the quantities of interest \(\overline{p}_{\text {cum}}^o\) (left) and \(\overline{\sigma }_{\text {eq}}\) (right) [9]

Table 5.2 Widths of the confidence intervals (CI) for the expectations, expressed as percentages of the estimated expectations

The expected values of \(\overline{p}_{\text {cum}}^o\) and \(\overline{\sigma }_{\text {eq}}\) are estimated with the empirical means \(\overline{Z}_n=\frac{1}{n}\sum _{i=1}^n Z_i\), where \(Z_i\) are the corresponding samples. The variances of \(\overline{p}_{\text {cum}}^o\) and \(\overline{\sigma }_{\text {eq}}\) are computed using the unbiased sample variance \(\displaystyle S_n^2=\frac{1}{n-1}\sum _{i=1}^n\left( Z_i-\overline{Z}_n\right) ^2\). The Central Limit Theorem gives asymptotic confidence intervals for the expected values: for all \(\alpha \in ]0;1[\),

$$\begin{aligned} I_n = \left[ \overline{Z}_n - \phi _{1-\frac{\alpha }{2}}\sqrt{S_{n}^{2}/n}; \overline{Z}_n + \phi _{1-\frac{\alpha }{2}} \sqrt{S_{n}^{2}/n}, \right] , \end{aligned}$$
(5.7)

where \(\phi _r\) denotes the quantile of order r of the standard normal distribution \(\mathcal {N}(0,1)\), and \(I_n\) is an asymptotic confidence interval with confidence level \(1-\alpha \) for the expectation \(\eta \): \(\lim _{n \rightarrow +\infty } \mathbb {P}(\eta \in I_n ) = 1 - \alpha \). The widths of the confidence intervals are expressed as a percentage of the estimated value for the expectations in Table 5.2.

The probability density functions of the quantities of interest can be estimated using Gaussian kernel density estimation (see Sect. 6.6.1. of [12]). Figure 5.11 gives the histograms and estimated distributions for \(\overline{p}_{\text {cum}}^o\) and \(\overline{\sigma }_{\text {eq}}\). The shapes of these distributions highly depend on the assumptions made for the stochastic thermal loading.

5.2.6 Workflow

Figure 5.12 provides an illustration of the workflow and the computational time of each step presented above.

Fig. 5.12
A process flow diagram. It begins with Maxproj L H S-D o E, Sobol D o E, and Monte Carlo D o E. The first two are followed by finite-element simulations with domain decomposition, physics-informed cluster analysis, medoids, construction of 2 local ROMs, data labeling, and feature selection.

Workflow for the ROM-net methology applied to the considered industrial setting [9]

5.2.7 Verification

For verification purposes, the accuracy of the ROM-net is evaluated on 20 Monte Carlo simulations with 20 new thermal loadings. These thermal loadings are generated by randomly sampling points from the uniform distribution on the 5D unit hypercube, and applying the transformation given in Eq. (5.5). The reduced simulations are run on single cores. The total computation time for generating a new thermal loading on the fly, selecting the corresponding reduced model, running one reduced simulation and reconstructing the quantities of interest is 4 min on average. As a comparison, one single high-fidelity simulation with Z-set [17] with 48 subdomains takes 53 min, which implies that the ROM-net computes 13.25 times faster. However, one high-fidelity simulation requires 48 cores for domain decomposition, whereas the ROM-net works on one single core. Hence, using 48 cores to run 48 reduced simulations in parallel, 636 reduced simulations can be computed in 53 min with the ROM-net, while the high-fidelity model only runs one simulation. In addition to the acceleration of numerical simulations, energy consumption is reduced by a factor of 636 in the exploitation phase. In spite of the fast development of high-performance computing, numerical methods computing approximate solutions at reduced computational resources and time are particularly important for many-query problems such as uncertainty quantification, where the intensive use of computational resources is a major concern. Model order reduction and ROM-nets play a prominent role toward green numerical simulations [20]. Of course, the number of simulations in the exploitation phase must be large enough to compensate the efforts made in the training phase, like in any machine learning or model order reduction problem.

Figures 5.13 and 5.14 show the results for two simulations belonging to cluster 0 and cluster 1 respectively. These figures give the difference between the current temperature field as the reference one, i.e., the field \(T-T_{\text {ref}}\), and the resulting variations of the quantity of interest predicted by the ROM-net and the high-fidelity model, i.e., \(p_{\text {cum}}^{o,\text {ROM}}(T) - p_{\text {cum}}^{o,\text {HF}}(T_{\text {ref}})\) and \(p_{\text {cum}}^{o,\text {HF}}(T) - p_{\text {cum}}^{o,\text {HF}}(T_{\text {ref}})\). The signs and the positions of the variations of the quantity of interest seem to be quite well predicted by the ROM-net.

Fig. 5.13
A set of five simulation results of a cluster 0. It provides the difference between the current temperature field and the reference one.

Comparison between high-fidelity predictions (middle column) and ROM-net’s predictions (right-hand column). The field on the left represents the difference between the current temperature field (belonging to cluster 0) and the reference one. The other fields correspond to the increments of the quantity of interest \(p_{\text {cum}}^{o}\) with respect to its reference state obtained with the reference temperature field [9]

Fig. 5.14
A set of five simulative results of the cluster 1. It provides the difference between the current temperature field and the reference one.

Comparison between high-fidelity predictions (middle column) and ROM-net’s predictions (right-hand column). The field on the left represents the difference between the current temperature field (belonging to cluster 1) and the reference one. The other fields correspond to the increments of the quantity of interest \(p_{\text {cum}}^{o}\) with respect to its reference state obtained with the reference temperature field [9]

Let us introduce a zone of interest \(\Omega '\) defined by all of the integration points at which \(p_{\text {cum}}^o\) is higher than \(0.4 \times \max p_{\text {cum}}^{o} ({\boldsymbol{\xi }})\) for the thermal loading defined by \(T_{\text {ref}} + \delta T_0\). This zone of interest contains 209 integration points. The values of the variables \(p_{\text {cum}}^o\) and \(\sigma _{\text {eq}}\) averaged over \(\Omega '\) are denoted by \(\overline{p}_{\text {cum}}^o\) and \(\overline{\sigma }_{\text {eq}}\). Table 5.3 gives different indicators quantifying the errors made by the ROM-net: the \(L^2\) relative errors on the whole domain \(\Omega \) and on the zone of interest \(\Omega '\), the \(L^{\infty }\) relative errors on \(\Omega \) and \(\Omega '\), the relative errors on \(\overline{p}_{\text {cum}}^o\) and \(\overline{\sigma }_{\text {eq}}\), and the errors on the locations of the points where the fields \(p_{\text {cum}}^o\) and \(\sigma _{\text {eq}}\) reach their maxima. All the relative errors remain in the order of \(1 \%\) or \(2 \%\), which validates the methodology. In addition, the ROM-net perfectly predicts the position of the critical points at which \(p_{\text {cum}}^o\) and \(\sigma _{\text {eq}}\) reach their maxima. Figure 5.15 shows errors on the quantities of interest.

Table 5.3 Error indicators for the evaluation of the ROM-net on 20 new thermal loadings
Fig. 5.15
A set of two results of the errors on the quantity of interest. The dark shades denote the overestimated zones in the quantity of interest.

Errors on the quantity of interest \(p_{\text {cum}}^{o}\). The red (resp. blue) color is used for zones where the quantity of interest is overestimated (resp. underestimated) [9]