Abstract
We introduce a framework for statistical inference of the closure coefficients using machine learning methods. The objective of this framework is to quantify the epistemic uncertainty associated with the closure model by using experimental data via Bayesian statistics. The framework is tailored towards cases for which a limited amount of experimental data is available. It consists of two components. First, by treating all latent variables (non-observed variables) in the model as stochastic variables, all sources of uncertainty of the probabilistic closure model are quantified by a fully Bayesian approach. The probabilistic model is defined to consist of the closure coefficients as parameters and other parameters incorporating noise. Then, the uncertainty associated with the closure coefficients is extracted from the overall uncertainty by considering the noise being zero. The overall uncertainty is rigorously evaluated by using Markov-Chain Monte Carlo sampling assisted by surrogate models. We apply the framework to the Spalart–Allmars one-equation turbulence model. Two test cases are considered, including an industrially relevant full aircraft model at transonic flow conditions, the Airbus XRF1. Eventually, we demonstrate that epistemic uncertainties in the closure coefficients result into uncertainties in flow quantities of interest which are prominent around, and downstream, of the shock occurring over the XRF1 wing. This data-driven approach could help to enhance the predictive capabilities of computational fluid dynamics (CFD) in terms of reliable turbulence modeling at extremes of the flight envelope if measured data is available, which is important in the context of robust design and towards virtual aircraft certification. The plentiful amount of information about the uncertainties could also assist when it comes to estimating the influence of the measured data on the inferred model coefficients. Finally, the developed framework is flexible and can be applied to different test cases and to various turbulence models.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
When using computational fluid dynamics (CFD) to investigate complex, industrial relevant aircraft configurations at high Reynolds numbers, the direct simulation of turbulent fluctuations is still computationally prohibitive. Thus, typically, the Reynolds averaged Navier–Stokes (RANS) equations are solved and turbulent behavior is modelled using so called turbulence models. Various different models are available. However, all of them are based on coefficients which need to be tuned theoretically and empirically based on experimental data in order to obtain accurate predictions. Traditionally, this was and is a rather manual process in which one tries to match numerical results with canonical experiments without accounting for uncertainties. Due to increasing computational resources and data being available, data-driven approaches based on machine learning gained attention for such coefficient tuning tasks within the past few years [1]. The idea of these data-driven approaches is to re-infer and/or to adjust the established coefficients or terms in the turbulence model based on experimental data or numerical data computed by direct numerical simulations (DNS) or other higher-fidelity numerical techniques by using statistical methods.
In CFD computations, the issue of physical modeling inaccuracy is normally inevitable. Numerical errors due to spatial and temporal discretization of the governing equations and physical models are one of the most important issues in CFD computations [2]. These inaccuracy and errors are recognized as epistemic uncertainty, which can be reduced if more knowledge is gained. Turbulence models are one of the important issues here. In measurements, on the other hand, the observed data may contain measurement errors due to various unknown external factors. These unknown factors are classified as aleatory uncertainty, which is inherently regarded as irreducible.
Data-driven methods depend on our hypothesis in statistical inference. One example of a conceivable hypothesis is that the form of the turbulence closure model is accurate enough, but that the coefficients defined in empiricism that relate to specific flow properties are not always correct for a variety of flow cases due to insufficient knowledge. In this article these coefficients are treated as stochastic by Bayesian inference. Bayesian inference refers to statistical inference where uncertainty in inferences is quantified using probability. In general, statistical inference methods vary from problem to problem but can abstractly be grouped in two categories on how to treat the parameters (latent variables), which are the coefficients in closure models. On the one hand, if large data sets are available, so-called point-estimate approaches are commonly applied with deep learning being the arguably most promising and efficient methodology. On the other hand, the inference of the latent variables by regarding them as stochastic by enforcing a Bayesian perspective provides uncertainty quantification (UQ) capabilities associated with the latent variables and enables another perspective on the quantities of interest (QoIs) in terms of observable variables. These capabilities become especially remarkable when the amount of data is relatively small. However, effective numerical techniques for UQ are crucial especially in high dimensional cases. In this article, the closure coefficients are treated as stochastic latent variables. The expression latent variables refers to unobserved variables and is used as an antonym to observed variables such as input and output variables. The stochastic approach then enables to quantify uncertainties of the latent variables in the QoI in terms of credible intervals. The above-mentioned categories are rigorously classified in Sect. 2. In this article, to emphasize the differences between tuning parameters by a point-estimate and evaluating them stochastically by an interval-estimate, we sometimes use the expressions calibration and inference, respectively. Here we start with reviewing some previous work on the point-estimate approach.
One of the examples of turbulence model calibration with the point-estimate approach is to minimize the error between experimental data and the output of RANS-based CFD computations for some QoIs. This is a direct approach. From a statistical inference perspective, this is known as a parametric regression process or parameter calibration: the turbulence model is considered as a nonlinear regression model whose inputs and outputs are parameters such as flow conditions, the numerical mesh, etc., and QoIs such as integral aerodynamic coefficients or pressure distributions, respectively. The closure coefficients are treated as the parameters (or globally speaking, latent variables) of the nonlinear regression model. Thus, least-squares optimization through minimizing the mean squared error (MSE) [3] provides turbulence closure coefficients calibration. In general, during the calibration process a larger number of RANS simulations are required causing a huge computational burden especially for industrial relevant cases. Hence, large computational costs during the optimization can be partly circumvented by relying on surrogate models such as Gaussian processes or neural networks [3, 4]. Another alternative is the use of adjoint flow solvers to efficiently compute the gradients of the QoIs with respect to the closure coefficients, enabling efficient gradient-based optimization for solving the least-square problems.
Another example of approaches using the point-estimate is calibration of the turbulence closure coefficients as the output of the statistical inference methods [5,6,7]. Since the coefficients to be inferred are now the output of regression models, not parameters in the models, any kind of supervised learning techniques are applicable. For example, the authors in Refs. [5,6,7] used neural networks to handle millions of data points by considering spatio-temporal parameters on flow fields.
These two approaches introduced above can theoretically be extended to an interval-estimate via the Bayesian perspective to evaluate the uncertainty associated with the closure coefficients. For example, in the direct approach using the least squares, the closure coefficients as the latent variables can be treated as stochastic to formulate a posterior distribution of the coefficients. As a result, the output QoIs also form a probability distribution, which is called predictive distribution. These probability distributions cannot be obtained analytically because the probabilistic models behind are inherently nonlinear regression models. The second approach mentioned in above may have further difficulties in extension to the interval-estimate. In case the parameters to be calibrated are high-dimensional, the use of Gaussian processes may become infeasible [7]. Bayesian neural networks would have to rely on approximation methods to compute the posterior distribution because of the high dimensionality of the parameters. In this article, our approach is based on the direct approach that the nonlinear regression model is a turbulence model equation with closure coefficients as parameters (the latent variables) to be inferred using the interval-estimate via the Bayesian perspective.
There are several methods to compute the posterior and the predictive distributions. The variational inference and the Laplace approximation [8] are versatile methods to approximate the posterior distribution by a parametric probability distribution. Since both of them require an iterative optimization process with many RANS computations, the direct simulation is still not as practical as the calibration process by the least squares. Markov-chain Monte Carlo (MCMC) methods [8, 9] are used to approximate the posterior distribution of a parameter of interest by random sampling in a parameter space. Unlike the above-mentioned parametric approximation methods, MCMC provides a theoretical guarantee that the histograms of the obtained sample points compose the posterior distribution when the sample size approaches infinity. Since each sample requires a RANS computation and the number of samples is usually greater than the number of iterations of the optimization, a direct implementation of MCMC is infeasible. It is essential for all MCMC methods to be supported by surrogate models in order to reduce the number of RANS computations. Once a surrogate model is constructed, the computational cost of MCMC can be negligible and MCMC can be regarded to have higher fidelity for computing the posterior than the other methods. However, it has to be noted that the application of MCMC to high-dimensional spaces remains a challenging topic. Also, from the point of view of surrogate model construction, it is desirable to keep the dimension of the parameters to be inferred as small as possible.
There is a body of existing research on surrogate-assisted MCMC approaches [10,11,12]. In Ref. [12], output uncertainty is directly computed via the samples generated by MCMC using an efficient boundary layer code. In Ref. [11] MCMC is used to evaluate the probability in the Bayesian update process for the purpose of calibrating turbulence model parameters. Surrogate-assisted MCMC approaches are also used for sensitivity analysis in the context of UQ [3, 13]. A review of recent work related to turbulence models can be found in Ref. [14].
In this article, the uncertainty associated with the closure coefficients is rigorously evaluated with a help of MCMC. The quantified uncertainty in the QoIs can be identified intuitively as a lack of knowledge of the coefficients. On the other hand, there are other uncertainties lurking in flow physics as mentioned above. All the other uncertainties, no matter where they come from, are classified as noise in a Gaussian probabilistic model to focus on the uncertainty associated with the closure coefficients.
The probabilistic model is a parametric model governed by a model parameter that describes the set of closure coefficients and a parameter to control the noise. The main feature of the Bayesian perspective here is to regard the parameters (the latent variables) as stochastic variables. In order to fully execute the Bayesian approach to account for all the uncertainties that are present, all the parameters are treated as stochastic. This modeling approach can quantify the uncertainty in the QoIs that stems from the contribution of all the uncertainties, i.e., uncertainty due to the closure coefficients and due to noise. However, our interest is to quantify the uncertainties associated with the closure coefficients. From a physical perspective, this uncertainty needs to be clearly distinguished from the noise. However, the parameter to control the noise depends on the turbulence model parameters when data is observed (it forms a joint probability). A method for extracting the uncertainty associated with the closure coefficients is presented and demonstrated using various test cases. Note that modeling only the closure coefficients as stochastic variables yields different results which do not account for the effects of the uncertainty due to noise.
As mentioned earlier, the ability to quantify the epistemic uncertainty associated with closure coefficients has a number of potential applications, for example, feedback to experimental measurement techniques and propagation of the uncertainty to the output QoI for arbitrary flow types. It can be possible to identify at which domains in the input parameter space (such as angles of attack, Mach number, and the location on the wings) we need to measure the output QoI in experiments (sensor placement). Note that this process is basically the same concept as so-called Bayesian optimization and active learning, effectively employing the posterior uncertainty. There is, however, an important difference as the inferred parameters (the latent variables) are physically meaningful in our applications.
The target test case to demonstrate the data-driven Bayesian inference of the turbulence model closure coefficients is based on the Airbus XRF1 research aircraft configuration and corresponding wind tunnel data at flight Reynolds numbers. The RAE2822 airfoil is also used to verify the methodology, but with numerically generated measured data. The output QoIs are the drag and surface pressure coefficients, for which measured data is required to infer the model coefficients.
The flow solver and the turbulence model used in this article, the methodologies and the developed framework are introduced in Sect. 2. In Sect. 3, results are presented for two test cases and discussed. Section 4 provides a summary and conclusions.
2 Methodology
This section introduces a framework for data-driven turbulence modeling and provides a generalized perspective to showcase that the developed framework can be applied to other (turbulence) models as well as other test cases.
2.1 Flow solver and turbulence model
The flow solver employed is the DLR TAU-code. It is based on a finite volume scheme, where inviscid terms are computed via a first or second order scheme. Viscous terms are computed via a second order scheme. Details of the solver can be found in Ref. [15,16,17]. In this paper, matrix dissipation is used. For time integration a LUSGS implicit scheme is chosen. Convergence acceleration is achieved with a multi-grid algorithm based on agglomerated coarse grids. Various turbulence models are available and the negative Spalart–Allmaras turbulence model [18] (referred to as the negative S–A) is investigated and hence altered in this paper.
The equations of the negative S–A are briefly described next. The negative S–A is derived from the original Spalart–Allmaras turbulence model (referred to as the original S–A). Details of the derivations provided below are available in Ref. [18]. The original S–A is a linear eddy viscosity model based on Boussinesq approximation. The turbulent eddy viscosity \({\nu}_{t}\) is calculated by:
Here, the transport equation to be solved to obtain \(\tilde{\nu }\) is as:
where \(\tilde{S}\) is the modified vorticity,
where \(S\) is the vorticity magnitude. The other coefficients in Eqs. (1–3) are defined as:
The original S–A contains two additional coefficients namely \(c_{t1}\) and \(c_{t2}\). In the DLR TAU-code, the trip term in Eq. (2) is neglected. Therefore, 9 closure coefficients are accounted for.
In the negative S–A, there are two modifications to the original S–A. The first one is that the modified vorticity \(\tilde{S}\) in Eq. (3) is enforced to be positive. The other one is a modification of the diffusion coefficient \(\nu + \tilde{\nu }\) in Eq. (2). Therefore \(\overline{S}\) in Eq. (2) is represented by two additional coefficients \(c_{v2}\) and \(c_{v3}\) when \(\overline{S} \leftarrow c_{v2} S\) as:
Otherwise Eq. (3) is retained. The diffusion coefficient \(\nu + \tilde{\nu }\) in Eq. (2) is modified to be \(\nu + \tilde{\nu }f_{n}\), where \(f_{n}\) is expressed as follows by introducing one coefficient \(c_{n1}\):
Hence, the negative S–A includes 12 closure coefficients. These coefficients are regarded as a parameter vector which should be inferred. Moreover, constraints on the feasible intervals of the coefficients are introduced to ± 50% of the default values. Table 1 shows the original coefficient values of the negative S–A and the valid intervals for the inference processes. Note that in Ref. [18], the modified vorticity \(\tilde{S}\) in the negative S–A should not below 0.3\(S\) posing an additional implicit constraint. Equations (3) and (5), might introduce further constraints on the intervals on \(c_{v2}\) and \(\kappa\), but these restrictions are not considered.
2.2 Theory
This sub-section introduces a parametric statistical inference method as the cornerstone of the developed framework. The process of the statistical inference method is fundamentally composed of two steps, learning and prediction. These steps are conducted on a parametric probabilistic model that we define. The definition relies on two hypotheses as the first two steps. Eventually the process of the parametric statistical inference is in a broad way composed of the following four steps:
-
1.
Define a regression model
-
2.
Define a probabilistic model
-
3.
Compute a posterior by given data (learning process)
-
4.
Compute output distributions (prediction process)
The first two steps introduce the two hypotheses by each. The overall process here can be also regarded as extension of Bayesian linear regression to nonlinear regression models in general. A principal figure to illustrate the process is shown Fig. 1. The goal is to compute the posterior distribution \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\) and the predictive distribution contributed only by the closure coefficients expressed as \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\) in Fig. 1. Details of each step are described in the next sub-sections.
2.2.1 The 1st Step: defining a regression model
The regression model here means a deterministic function that produces a scalar output \(y\) when an arbitrary input vector \({\varvec{x}}\) is given. The model characteristics are controlled by a parameter vector \({\varvec{a}}\). These are deterministic variables currently. Normally the parameter \({\varvec{a}}\) is being determined by the learning process in the 3rd step with available data. Based on this concept, a CFD computation using a turbulence model can be represented as the deterministic function device:
where \({\varvec{a}}\) represents a vector as a set of all the turbulence model coefficients. The input \({\varvec{x}}\) is a vector as a set of flow conditions and/or a spatial coordinate. The output \(y\) is a scalar value as an output aerodynamic value such as lift and drag or distributed quantities, e.g. pressure coefficient distributions. The function \(f\) represents then the CFD computation with a turbulence model as a mapping \(f:\left( {{\varvec{x}},{\varvec{a}}} \right) \mapsto y\). For example, in this paper a turbulence model here is the negative S–A. This is the first hypothesis in the statistical inference method. As this first hypothesis for example, other turbulence models could be set here by modeling it as \(f\left( {{\varvec{x}},{\varvec{a}}} \right)\). The negative S–A has 12 independent coefficients, which are summarized as one parameter vector \({\varvec{a}}\), whose dimensionality is 12, i.e. \({\varvec{a}} = \left( {\sigma_{SA} ,\kappa ,c_{b1} ,c_{b2} ,c_{v1} ,c_{v2} ,c_{v3} ,c_{w2} ,c_{w3} ,c_{ct3} ,c_{ct4} ,c_{n1} } \right)\) by following Table 1. The parameter \({\varvec{a}}\) in Eq. (7) is called model parameter or simply parameter in this paper. Note that when our interest is only the input and the output as supervised learning, the function \(f\) is freely chosen (e.g. by model selection conducted in the third and fourth steps).
2.2.2 The 2nd Step: defining a probabilistic model
In the second step, now the output \(y\) is considered to be a stochastic variable. The process starts from assuming that the observed data \(y\) as the output has been generated by a probabilistic model as \(y\sim p\left( y \right)\). Our interest is evaluating the epistemic uncertainty associated with the turbulence model \(y\sim p\left( y \right)\) by diminishing the effects of all the other uncertainties, even the mesh and the scheme effects. Other possible strategies for Bayesian inference can be found in Ref. [19]. First, the discrepancy between the output \(y\) and the model \(f\left( {{\varvec{x}},{\varvec{a}}} \right)\) is represented by introducing a stochastic variable labelled \(\varepsilon\). Thus, the output \(y\) can be defined as follows:
The physical meaning of the stochastic variable \(\varepsilon\) is noise in the output value \(y\). Since \(\varepsilon\) is stochastic, a probabilistic model is introduced into this variable. One example of the probabilistic model of \(\varepsilon\) is an univariate Gaussian distribution as \(\varepsilon \sim {\mathcal{N}}\left( {0,\sigma^{2} } \right)\), where \({\mathcal{N}}\) and \(\sigma^{2}\) represents the Gaussian distribution and its variance, respectively. This eventually brings the following conditional probability distribution of \(y\):
Now \(\sigma\) has been added as another parameter (called hyperparameter or noise parameter in this paper to be distinguished with the parameter \({\varvec{a}}\)). Equation (9) is the probabilistic model of the observed variable \(y\) as the objective here. Different from the observed variable \(y\) (and \({\varvec{x}}\)), the parameters \({\varvec{a}}\) and \(\sigma\) are not observed (the latent variables). In the fully Bayesian approach, the hyperparameter \(\sigma\) is also treated as a stochastic variable as \(\sigma \sim p\left( \sigma \right)\). Thus, the defined probabilistic model can account for all the uncertainties that are present.
Following Eq. (8), two sorts of induced uncertainties can be easily separated by looking at each term individually. since the model parameter \({\varvec{a}}\) is the only parameter that can directly interact with the internal structures of the model \(f\). All the other external uncertainties are dealt within the term \(\varepsilon\) governed by the noise parameter \(\sigma\). No other functions (such as bias functions) are added in Eq. (8). The parameter \({\varvec{a}}\) is then treated as a stochastic variable in the Bayesian approaches as \({\varvec{a}}\sim p\left( {\varvec{a}} \right)\). The function \(f\) as the first term also becomes stochastic. The stochastic variable \(y\sim p\left( y \right)\) as the predictive distribution becomes a mixture of probabilities derived from the two stochastic terms after the learning process. After the learning process when data is given, all the parameters in Eq. (9) as the latent variables are not independent of each other and compose a joint probability distribution. In the 4th step, a rigorous form of the probability of \(y\) as \(p\left( y \right)\) is presented. In that form, the contribution of the uncertainty associated with the parameter \({\varvec{a}}\) is not that obvious like Eq. (8) due to the above fact.
2.2.3 The 3rd Step: computing a posterior by given data (learning process)
The third step is the learning process of the parameters \({\varvec{a}}\) and \(\sigma\) using data. In this step, derivation to the Bayesian approach is employed. The process in this step is to evaluate the joint probability of all the parameters \({\varvec{a}}\) and \(\sigma\) as a posterior distribution. The posterior distribution can be evaluated using the Bayes’ theorem by prior distributions of all the parameters \({\varvec{a}}\) and \(\sigma\), and the likelihood function obtained by the probabilistic model and the data. How to compute the posterior function is described below. Some additional details about the definition of the prior distributions etc. are in Appendix 1.
The likelihood function can be defined by data and the probabilistic model defined in the second step as:
where \({\mathbf{X}} = \left( {{\varvec{x}}^{\left( 1 \right)} ,{\varvec{x}}^{\left( 2 \right)} , \ldots ,{\varvec{x}}^{\left( N \right)} } \right)\) and \({\mathbf{Y}} = \left( {y^{\left( 1 \right)} ,y^{\left( 2 \right)} , \ldots ,y^{\left( N \right)} } \right)\) are observed data. \(N\) denotes the sample size of the data. The data is assumed to be independently drawn from the defined probabilistic model by Eq. (9) since the model is an univariate Gaussian distribution, hence no correlations exist. It has to be emphasized that direct evaluation of the likelihood function by Eq. (10) is practically available for the learning process instead of that of the posterior \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\) (see Appendix 1). The computation of the marginal likelihood \(p\left( {{\mathbf{Y}}{|}{\mathbf{X}}} \right)\) used as the normalization constant is either not necessary in executing the MCMC sampling techniques.
There are mainly two perspectives in the learning process (also see Fig. 1). One is point-estimate while the other is interval-estimate. In practice, these names are driven from the properties of the parameters to be inferred using the posterior distribution. The commonly known least square (LSQ) approach can be categorize as a special case of the point-estimate. The interval-estimate simply corresponds to using all the information of the posterior itself. The point-estimate in this paper directly indicates maximum a posterior (MAP). The MAP can be achieved by literally maximizing a posterior function \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\) with respect to the parameters \({\varvec{a}}\) and \(\sigma\):
In practice, Eq. (11) is solved analytically when the regression model \(f\left( {{\varvec{x}},{\varvec{a}}} \right)\) is a linear regression model. Since the regression model here is a nonlinear regression model, typically Eq. (11) is solved employing optimizers.
Under the above-mentioned condition that no specific prior distribution is set, the direct evaluation of the likelihood function by Eq. (10) is available in practice. When pursuing the point-estimate using an optimizer, error functions which are derived by taking the negative of the log-likelihood function (see Appendix 5 for further details) are practically used. When pursuing the interval-estimate, the sampling method using MCMC can be directly applied to the likelihood function in Eq. (10). This evaluation by MCMC provides an approximation of the posterior \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\) by sample points. The posterior is composed by a histogram of the sample points in the parameters space \({\varvec{a}},\sigma\). This means that the frequency of the generated sample points directly forms a scaled function of the posterior. The mode of the posterior \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\) represents a pair of the calibrated parameters \(\hat{\user2{a}},\hat{\sigma }\). By using the obtained sample points, the predictive distributions \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\) shown in Fig. 1 as our goal are computed at the fourth step. Again note that same as requirement of optimizers in Eq. (11), the reason why numerical approaches are required is because the regression model \(f\left( {{\varvec{x}},{\varvec{a}}} \right)\) is a nonlinear regression model.
Various established algorithms for MCMC are available and herein the random walk Metropolis-Hastings is applied. Details on MCMC algorithms can be found in many papers [8, 9]. As for optimizer for the point-estimate, we use differential evolution algorithm, which is a heuristic method. The reason why we do not use only the MCMC even for the point-estimate is for double-check of the accuracy of the tools. Therefore, the results by optimizers are not directly used in the prediction process.
2.2.4 The 4th Step: computing output distributions (prediction process)
In the last step the output \(y\) is computed by using the learned parameters: \(\hat{\user2{a}},\hat{\sigma }\) by the point-estimate, or \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\) by the interval-estimate. First as for the point-estimate, the prediction of the output \(y\) for a given input \({\varvec{x}}\) is obtained by evaluating the regression model in Eq. (7):
Note that even for the point-estimate, the predictive distribution \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)\) is introduced following Eq. (9). However, this predictive distribution contains only the noise \(\varepsilon\) as identified by the point-estimate represented by the standard deviation \(\hat{\sigma }\) as \(\varepsilon \sim {\mathcal{N}}\left( {0,\hat{\sigma }^{2} } \right)\).
In the interval-estimate (with fully Bayesian perspective), the parameters \({\varvec{a}},\sigma\) form a joint probability as \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\) which is represented by the MCMC sample points. The predictive distribution \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)\) is obtained as follows by the sum and product rules of the probability theory:
The parameters \({\varvec{a}},\sigma\) are marginalized (integrated out) to be eliminated. This operation can be regarded as taking the expectation. The probability as the weight is already realized by the MCMC sampling result. Therefore, it can be computed by mixtures of the probabilistic model \(p\left( {y{|}{\varvec{x}},{\varvec{a}}^{\left( i \right)} ,\sigma^{\left( i \right)} } \right)\), where each sample point \(i\) is weighted with its respective probability \(p\left( {{\varvec{a}}^{\left( i \right)} ,\sigma^{\left( i \right)} {|}{\mathbf{X}},{\mathbf{Y}}} \right)\). The constraint that the sum of all the weights has to be 1 to be a probability is accomplished by being divided with the sample size \(N_{mcmc}\). Note that, the predictive distribution shown in Eq. (13) is a common way for a fully Bayesian approach [8]. No hyperparameters in priors of the parameters \({\varvec{a}}\) and \(\sigma\) are explicitly presented in Eq. (13) since an improper prior for both \({\varvec{a}}\) and \(\sigma\) is assumed. Note that investigation when using an empirical Bayes approach in contrast to the full Bayes approach is shown in Appendices 2 and 3. In Appendix 3, a comparison of results between \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)\) in Eq. (13) and \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\) in Eq. (15a, 15b) using the empirical Bayes approach is also shown.
The predictive distribution \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)\) in Eq. (13) represents the contribution of all the sources of the uncertainties, one of which is associated with the model parameter \({\varvec{a}}\), and all the others of which is associated with the noise parameter \(\sigma\). These two sorts of uncertainties physically stem from different sources as shown in Eq. (8) and the predictive distribution \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)\) in Eq. (13) lost the information of their origins. Our goal is to quantify the epistemic uncertainty associated with the closure coefficients \({\varvec{a}}\), without including the external factors represented by the noise \(\varepsilon \sim {\mathcal{N}}\left( {0,\sigma^{2} } \right)\). This is accomplished through the following two steps by using the sum and product rules of the probability theory. The first step is to obtain the marginal posterior \(p\left( {{\varvec{a}}{|}{\mathbf{X}},{\mathbf{Y}}} \right)\) by marginalizing the posterior \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\) with respect to \(\sigma\):
Then, in order not to take the noise term in Eq. (8) into account, the probabilistic model of Eq. (9) where \(\sigma \to 0\) is considered. Eventually the predictive distribution where only the contribution of the parameter \({\varvec{a}}\) is considered is represented as follows:
where \(p\left( {{\varvec{a}}{|}{\mathbf{X}},{\mathbf{Y}}} \right)\) indicates the marginal distribution obtained by Eq. (14), \(\delta\) is the Dirac delta function as the limit of normal distributions as \(\sigma \to 0\). In practice, the technical process itself becomes eventually very simple. The predictive distribution \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\) in Eq. (15a) can be approximated by the output values of the MCMC sample points themselves in practice (by substituting the MCMC samples for the regression model \(f\left( {{\varvec{x}},{\varvec{a}}} \right)\)):
where \({\varvec{a}}_{i}\) is the \(i\)th sample point \(\left( {{\varvec{a}},\sigma } \right)_{i}\) obtained by the MCMC sampling with omitting the \(\sigma_{i}\) component in the sample point. The meaning of the predictive distribution \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\) in Eq. (15a) is different from that of \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)\) in Eq. (13). Both of them are a probability of the output \(y\) after learning from the same data. However, the predictive distribution in Eq. (15a) represents the output uncertainty contributed only by the turbulence closure coefficients \({\varvec{a}}\) which have been learned by considering all the uncertainties that are present. The closure coefficients \({\varvec{a}}\) themselves were learned by taking the information of the uncertainty of the noise into account but the output \(y\) as QoI reflects only the epistemic uncertainty associated by \({\varvec{a}}\) eventually.
Overall for the entire steps, the process is common when a likelihood function is determined. The likelihood function in Eq. (10) is one example. Based on the concept of the four steps, the likelihood function to be evaluated (by either an optimizer or MCMC) can be naturally and flexibly determined for each problem setting. Note that further abstraction of the entire process of the four steps can be generalized by classifying all the stochastic variables into the latent variables and the observed variables (see Appendix 6).
2.3 Practical implementation details
This section provides insight into the practical implementation details with a special focus on computing the likelihood function. The likelihood function represented by Eq. (10) contains \(f\left( {{\varvec{x}}^{\left( i \right)} ,{\varvec{a}}} \right)\), which is obtained through CFD computations. It is iteratively computed during the processes by the optimizer and MCMC, due to variations of the parameters \({\varvec{a}}\) and \(\sigma\). The iteration number of this process could be the order of 103 to 106. It means that this order of times of the CFD computations are needed. This is the most time-consuming part in the whole process. The iteration number is unable to be predicted and this process itself is infeasible in practice especially for 3D aircraft test cases. A surrogate-assisted strategy is used for efficiency. A series of the following surrogate-assisted strategy is conducted in a framework of the DLR’s surrogate and reduced-order modeling toolbox called SMARTy. Recent articles on research works using SMARTy are found in Refs. [20, 21].
Once a surrogate model is constructed, no additional CFD computations are needed. A surrogate model is constructed at each flow condition (and/or at each spatial coordinate) \({\varvec{x}}^{\left( i \right)}\). The process of building a surrogate model is as follows. First, search domains of the turbulence model parameter \({\varvec{a}}\) are required to be decided. We set this as the ranges from 0.5 to 1.5 \({\varvec{a}}_{org}\) as already defined in Table 1, where \({\varvec{a}}_{org}\) is a vector of a set of the original turbulence model coefficients. Sampling locations where CFD computations are executed are determined by design of experiment (DoE). We used Sobol sequence, which is categorized as a kind of quasi Monte Carlo sampling methods [22, 23]. The Sobol sequence is able to keep the uniformity (the low-discrepancy) of the sampling filling the space even when the dimensionality of the input parameter is relatively high. It is also able to add sequentially low-discrepancy sampling points when the original sample size is not sufficient to achieve the accuracy of the constructed surrogate model. Based on the sample points generated by DoE, a surrogate model is constructed. Note that the process of constructing a surrogate model is theoretically the same as the statistical inference method in the previous subsection (a regression model by Eq. (12) or analytical solutions of the dual representation of Eq. (13) in e.g. Gaussian processes, where \({\varvec{x}}\) and \({\varvec{a}}\) in the previous subsection corresponds to \({\varvec{a}}\) and parameters to describe the surrogate model, respectively).
The constructed surrogate model eventually can be recognized as function approximation on the parameter space of \({\varvec{a}}\) at each flow condition (and/or at each spatial coordinate) as \(\tilde{y}^{\left( i \right)} = \tilde{f}\left( {{\varvec{a}};{\varvec{x}}^{\left( i \right)} } \right)\), where noted that \({\varvec{a}}\) is now input but \({\varvec{x}}^{\left( i \right)}\) as a set of parameters of the flow conditions is fixed. In practice, the new input of the surrogate model \(\tilde{f}\) corresponds to a new turbulence model parameter \({\varvec{a}}_{new}\) nominated during the process in the optimizer/MCMC. Thus, the output \(\tilde{f}\left( {{\varvec{a}}_{new} ;{\varvec{x}}^{\left( i \right)} } \right)\), which is iteratively computed in the parameter space \({\varvec{a}}\) during the optimization or MCMC process, is efficiently obtained via the surrogate model at each fixed flow condition (and/or at each spatial coordinate) \({\varvec{x}}^{\left( i \right)}\). Now the most time-consuming part has been replaced by the limited CFD computations on the DoE sample points, which are now predictive in terms of the cost. The required sample size to construct a surrogate model is normally dependent on the dimensionality of the input parameter \({\varvec{a}}\). The sample size is shown later at each test case in Sect. 3. The surrogate models here used are Gaussian processes with hyperparameter tuning for prediction of drag coefficient as \(f\), and additionally assisted by proper orthogonal decomposition (POD) for the cases of prediction of pressure coefficient. The kernel function commonly used in the Gaussian processes is a Gaussian kernel. The hyperparameter tuning as point-estimate to determine the probability distribution is achieved by the MLE. In prediction of pressure coefficients, multiple output of \({\mathbf{Y}} = \left( {y^{\left( 1 \right)} ,y^{\left( 2 \right)} , \ldots ,y^{\left( N \right)} } \right)\) is assisted by the POD at once for multiple input \({\mathbf{X}} = \left( {{\varvec{x}}^{\left( 1 \right)} ,{\varvec{x}}^{\left( 2 \right)} , \ldots ,{\varvec{x}}^{\left( N \right)} } \right)\), where \({\varvec{x}}\) is a spatial coordinate in this case, \(N\) as the sample size can be the number of the surface output of a wing.
During the entire process, a variance-based sensitivity analysis can be additionally inserted for the purpose of selection of dominant model coefficients to QoI as the output. The results of the sensitivity analysis are expected to bring physical aspects of the turbulence model coefficients. It can even bring dimensionality reduction, which can benefit of the computational costs. The required sample size for the DoE is basically dependent on the dimensionality of the input parameter space \({\varvec{a}}\). Therefore, it directly brings reduction of the required number of CFD computations. However, the sensitivity analysis itself requires the whole information using the original dimensionality of the parameter. Hence, it can be useful to use the results of a test case to other test cases to proceed the whole process with the selected coefficients. The global sensitivity of the input variables with respect to the output is computed by DoE samples using the Sobol sequence assisted by the constructed surrogate models. The execution of the Sobol indices itself was done by the Python library called SALib [24]. Details of the theory of the Sobol indices and sensitivity analysis methods in general can be found in Ref. [25].
Finally the flow chart of the data-driven method in this paper is shown in Fig. 2. This flow chart describes the direct process used in the application test cases in Sect. 3. The estimation of the modes by the optimizer can be a strategy for double check if the MCMC works properly.
3 Applications and results
Two case studies were performed to validate and demonstrate the methods presented above: (a) a two-dimensional RAE2822 airfoil and (b) a three-dimensional full aircraft model based on the Airbus XRF1. Analyses of both test cases targeted the transonic regime. For the 2D test case, artificial experimental data was used to validate the proposed framework. Then, for the 3D test case, several sets of proprietary wind tunnel data for XRF1 were used to compare the simulation results against the experimental results. In this article, actual values cannot be published for XRF1 due to reasons of confidentiality. Therefore, quantitative results and/or relative evaluations are shown in Sect. 3.2. The 3D case is, however, valuable to demonstrate that the framework can be applied to real, industrial-grade test cases.
The parameter settings of the MCMC are common for all the test cases in this article. The sample size is 105 and the burn-in ratio is 0.8. The step length for \({\varvec{a}}\) normalized by \({\varvec{a}}_{org}\) and \(\sigma\) normalized by \(\hat{\sigma }\) are 0.025 and 0.05, respectively. Here \(\hat{\sigma }\) is the noise parameter obtained by the deterministic approach obtained by an optimizer (e.g. by Eq. (11)). Note just for visualization issues of the following figures that in the graphs of marginal posterior distributions of \({\varvec{a}}\) and \(\sigma\) obtained by the MCMC, the graphical ranges of \({\varvec{a}}\) and \(\sigma\) are common for all the figures in this article. The range of \({\varvec{a}}\) is from 0.5 to 1.5 \({\varvec{a}}_{org}\). The center is \({\varvec{a}}_{org}\). That of \(\sigma\) is from 0 to 5 \(\hat{\sigma }\) (only Fig. 7 is 0 to 2 \(\hat{\sigma }\)). This indicates that the mode of the posterior of \(\sigma\) should be located one fifth from the left (or the middle in Fig. 7) when the MCMC works well by producing the same results as the optimizer. Note also that all the predictions depicted by lines, and predictive distributions depicted by filled colored areas, have only their valid points/ranges at the discrete input flow conditions since the surrogate models are available for those discrete input flow conditions.
3.1 2D RAE2822 Airfoil
The RAE2822 airfoil is introduced here to validate the methodology and to investigate the characteristics of the turbulence model closure coefficients. The baseline Mach number is 0.734 and it varies in the transonic regime. The angle of attack and the Reynolds number are fixed at 2.79 degree and 6.5 million, respectively. The mesh used for CFD computations and a \(C_{p}\) visualization at this flow condition are shown in Fig. 3. The number of nodes is around 29 thousand.
For the purpose of dimensionality reduction, dominant closure coefficients, with respect to aerodynamic coefficients, are selected based on their Sobol indices. These selected closure coefficients are subsequently used as parameter \({\varvec{a}}\) for the XRF1 test case to reduce the computational costs; the assumption being that the flow characteristics are similar to each other in the transonic regime. Note that the results of the Sobol indices depends on the QoI as the output.
3.1.1 Sobol indices as preprocessing
The required information to generate the Sobol indices is obtained through surrogate models, which contain all of the closure coefficients as input parameters and the QoI as the output value. In this case the drag coefficient (\(C_{d}\)) was selected as the QoI to generate the Sobol indices. The surrogate models of \(C_{d}\) were constructed at each flow condition where the Mach number ranged from 0.704 to 0.784 in 0.01 increment. Each surrogate model was built using 100 DoE sample points generated between 0.5 and 1.5 \({\varvec{a}}_{org}\).
Table 2 shows results of the Sobol indices over the Mach number range. The percentages indicate the relative amount of contribution of the turbulence model coefficients to \(C_{d}\) at each flow condition. The mean values of the percentages at each coefficient are also shown in Table 2. It can be observed from the mean results that \(\kappa\) and \(c_{b1}\) are the most dominant followed by \(\sigma_{SA}\) and \(c_{v1} .\) This ranking of the result has similar characteristics to the works by Papadimitriou et al. [26] and Da Ronch et al. [2] when they investigated the original S–A. They listed \(\kappa\), \(c_{b1}\) and \(c_{v1}\) as the most influential coefficients to the output uncertainty of their quantities of interest. From the result in Table 2, the coefficients resulting from the negative S–A model which are not present in the original S–A model are not influential.
The selection process to determine which coefficients are dominant is similar to the mindset enforced when POD is employed and relies on using eigenvalues. Here we decided to truncate the list of dominant coefficients at 90%. Eventually the coefficients, \(\kappa\), \(c_{b1}\) and the \(\sigma_{SA}\), were selected as the dominant coefficients. These selected coefficients are set to the components of the turbulence model parameter \({\varvec{a}}\) as \({\varvec{a}} = \left( {\sigma_{SA} ,\kappa ,c_{b1} } \right)\) to be inferred in the following test cases. The process described so far is a preprocessing process and leads into the learning process using observed data. From now as a validation test of the framework, data-driven turbulence modeling results are shown by using artificial data.
3.1.2 Inference using drag coefficient
First the drag coefficient (\(C_{d}\)) is used as the QoI. Figure 4 shows marginal posterior distributions of each component of \({\varvec{a}}\) and \(\sigma\) after the learning process, and the obtained predictive distribution. As described in Sect. 2, the posterior distributions \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\) and the predictive distributions \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\) by Eq. (15) (contributed only by the closure model coefficient \({\varvec{a}}\)) were obtained directly by the MCMC sampling. Figure 4a also includes the result of \(\hat{\user2{a}}\) obtained through the optimization for comparison.
The artificial data is also depicted in Fig. 4b. It was created based on CFD results obtained by using \({\varvec{a}}_{org}\) with independently and randomly generated noise from the Gaussian distribution \({\mathcal{N}}\left( {0,\sigma_{org}^{2} } \right)\), where \(\sigma_{org}\) is a constant value set to \(\sigma_{org}\) = 0.001 (10 drag counts), which was manually set by the authors. Since we know that the generated noise follows an independently identically distributed (i.i.d) Gaussian distribution, in this case, it is explicitly justified to use Eq. (10) as the likelihood function. It can be confirmed from Fig. 4a that \(\sigma_{SA}\) as \(a_{1}\) (the first element of \({\varvec{a}}\)) has greater uncertainty than the other coefficients in the relative ranges defined by \({\varvec{a}}_{org}\). In Fig. 4b, a reference result of the output obtained by \({\varvec{a}}_{org}\) and a result obtained by \(\hat{\user2{a}}\) (by the point-estimate) are depicted by the dotted orange lines and the green lines, respectively. The values of \(\hat{\user2{a}}\) and \(\hat{\sigma }\) obtained by the optimizer is \(\hat{\user2{a}}\) = [0.801, 0.417, 0.129] and \(\hat{\sigma }\) = 0.000689, respectively. Hence \(\hat{\user2{a}}\) obtained by the optimizer agrees well with the modes of the marginal posterior distributions obtained by the MCMC in Fig. 4a, However, the obtained \(\hat{\user2{a}}\) and \(\hat{\sigma }\) could not detect \({\varvec{a}}_{org}\) and \(\sigma_{org}\) as the reference (each component of \({\varvec{a}}_{org}\) is at the center of the horizontal axis). Nevertheless, it can be confirmed from Fig. 4a that the marginal posterior distributions are wide and covers \({\varvec{a}}_{org}\), \(\sigma_{org}\) with high probability density except for the first component of \({\varvec{a}}_{org}\). Therefore, a lack of the data for this amount of noise has been revealed by the results of the posterior obtained by the MCMC. It is emphasized that this information is never detected by the point-estimate.
Thus, this test case using the artificial data is not only validating the developed method but moreover offers further insight into underlying flow physics. For example, it can be observed that the influence of the turbulence closure coefficients is less when predicting the drag coefficient for a Mach number around 0.74 in contrast to higher and lower Mach numbers.
Next, the same test case but using prior information is also investigated. Prior can be set to any arbitrary probability of the parameters \({\varvec{a}}\) and \(\sigma\) independently, e.g. as \(p\left( {\varvec{a}} \right)\). So far, the constraints of the parameter as \({\varvec{a}} \in\)[0.5 \({\varvec{a}}_{org}\),1.5 \({\varvec{a}}_{org}\)] have been used in the process of the optimization and MCMC. Hence, these constraints are implicitly employed as prior information in the form of uniform distributions for all the test cases (see Appendix 1). In contrast, we enforce a prior distribution for \({\varvec{a}}\) which is Gaussian and can be represented as:
where \({\varvec{\sigma}}_{pri} = \left( {\sigma_{pri,1} ,\sigma_{pri,2} , \cdots ,\sigma_{pri,M} } \right)\) and \(M\) is the dimensionality of the parameter \({\varvec{a}}\). Therefore it is the number of the turbulence model coefficients to be inferred. \(\sigma_{pri,j}\) indicates the variance of the \(j\) th element of the parameter \({\varvec{a}}\).
Figures 5 and 6 show marginal posterior distributions and predictive distributions, respectively, when the specific priors expressed by Eq. (16) are respected during the inference process. Priors for each element of the parameters can be independently set. Therefore, prior information is introduced at three different locations: (A) only for \({\varvec{a}}\), (B) only for \(\sigma\) and (C) for both \({\varvec{a}}\) and \(\sigma\). The function to be evaluated by the optimizer and the MCMC in case (A) is Eq. (16). Those for cases (B) and (C are \(p\left( {{\mathbf{Y}}{|}{\mathbf{X}},{\varvec{a}},\sigma } \right)p\left( {\varvec{a}} \right)\) and \(p\left( {{\mathbf{Y}}{|}{\mathbf{X}},{\varvec{a}},\sigma } \right)p\left( {\varvec{a}} \right)p\left( \sigma \right)\), respectively, where \(p\left( \sigma \right)\) can be obtained by as again following Eq. (16). The mode as (\(\hat{\user2{a}}\), \(\hat{\sigma }\)) obtained by direct optimization for each case is also described in Fig. 5.
Due to the prior functions, the posteriors shown in Fig. 5 can be regarded as having updated from those in Fig. 4a. For example, in case (B), the uncertainty of \(\sigma\) as noise is reduced. In Fig. 5a, the uncertainty of \({\varvec{a}}\) is also reduced and the mode moved closer to \({\varvec{a}}_{org}\). Moreover, the resulting predictive distribution \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\) in Fig. 6a is smaller compared to Fig. 4b and is distributed around the reference result.
3.1.3 Inference using pressure coefficient
After using the global quantity \(C_{d}\) as the QoI, next the pressure coefficient (\(C_{p}\)) distributions is investigated. As for the drag coefficient analysis, artificial noise data is generated. Similar to before, the data was created based on CFD results using \({\varvec{a}}_{org}\) with independently and randomly generated noise from a Gaussian distribution \({\mathcal{N}}\left( {0,\sigma_{org}^{2} } \right)\), where \(\sigma_{org}\) = 0.1. The likelihood function is expressed as follows since the probabilistic model of the noise \(\varepsilon\) is assumed to be a Gaussian distribution \({\mathcal{N}}\left( {0,\sigma^{2} } \right)\) so that the probabilistic model of the output \(y\) that we defined is \({\mathcal{N}}\left( {f\left( {{\varvec{x}}^{\left( i \right)} ,{\varvec{a}}} \right),\sigma^{2} } \right)\). Therefore the likelihood function when the dataset \({\mathcal{D}} = \left( {{\mathbf{X}},{\mathbf{Y}}} \right)\) is:
where \(N_{data}\) and \(N_{j}\) indicate the number of data (equivalent to the number of the flow conditions) and the number of the available pressure coefficient at each flow condition, respectively. Note that this equation means that one unique variance \(\sigma^{2}\) to represent the output noise \(\varepsilon\) is assumed to be common at all the flow conditions, i.e. assuming that the noise level is the same at any flow conditions.
Figure 7 shows marginal posterior distributions and their predictive distributions \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\) at Mach numbers of 0.704, 0.734 and 0.764. The reference results obtained by \({\varvec{a}}_{org}\) and the results by the point-estimate are described as the dotted orange lines and the green lines, respectively. It can be seen that the shock waves around the mid-chord when the Mach number is at 0.704 and 0.734 can be accurately detected by the deterministic approach (the green line match the reference depicted by the orange dotted lines). The uncertainty shown as 95% credible interval (CI) is wider at the Mach number of 0.704. This uncertainty is the epistemic uncertainty associated with the turbulence closure coefficients. On the other hand, the point-estimate could not detect the reference at the Mach number of 0.764 but the resulting wide uncertainty (shown as 95% CI) by the interval-estimate covers the reference. These results mean that the point-estimate cannot detect the reference but the interval-estimate can as a probability and furthermore can indicate sensitive flow domains/conditions accordingly, which was not included in the given artificial data as information.
3.2 XRF1 generic long-range transport aircraft
In this section the XRF1 is investigated relying on high-quality but proprietary wind tunnel data at transonic flow conditions and flight Reynolds numbers. XRF1 is an Airbus provided industrial standard multi-disciplinary research testcase representing a typical configuration for a long-range wide body aircraft. The XRF1 research testcase is used by Airbus to engage with external partners on development and demonstration of relevant capabilities / technologies.
The XRF1 wind tunnel data set at hand consists of drag (\(C_{D}\)) and pressure coefficients (\(C_{p}\)) at nine flow conditions, namely Case 1 to Case 9. The Mach number and the Reynolds number are fixed between the cases while the angle of attack (AoA) is varied. The Mach number and the Reynolds number are around 0.86 and 25 million, respectively. All the flight conditions are within the transonic regime and subsequently shock waves are present on the wing. In this test case, no specific prior is used (the improper prior is implicitly used) to learn the coefficients only from the wind tunnel data.
The inference is executed on two problems:
-
1.
use \(C_{D}\) data from all the nine cases at once,
-
2.
use \(C_{p}\) data available at three sections of the wing focusing on one Case only.
At each flow condition, the aircraft configuration is deformed due to its aeroelastic response. However, in this paper this influence is neglected and a fixed aircraft configuration is used for CFD computations for all nine cases. This fixed configuration has the surface deformation measured for Case 4. Hence for the second analysis focusing on \(C_{p}\) no deformation inconsistency is present. For the \(C_{D}\) analysis the inconsistency in surface deformations is expected to emerge to some extend as the variance \(\sigma^{2}\) in Eq. (9). Figure 8 shows the corresponding grid used for RANS computations. The number of grid nodes is around 6.3 million. As one reference showing a validation result of the grid (which also causes an epistemic uncertainty but evaluated in the noise parameter when it exists), we take an example that the difference of \(C_{D}\) between the corresponding data and a RANS computation using the original turbulence model coefficient \({\varvec{a}}_{org}\) is 3 drag counts (where one drag count is 10–4).
The results of the Sobol indices for the RAE2822 airfoil are used to downselect the turbulence model parameters which are inferred for the XRF1 assuming that both cases share underlying physics due to the transonic flow regime. Hence, the turbulence model coefficients \(\kappa\), \(c_{b1}\) and \(\sigma_{SA}\) are the inferred coefficients group in the parameter \({\varvec{a}}\) (\({\varvec{a}} = \left( {\sigma_{SA} ,\kappa ,c_{b1} } \right)\)). All neglected coefficients are fixed at the original S–A model values given in Table 1.
Surrogate models for the drag coefficient are constructed at each flow condition while for the pressure coefficient distributions one model for each wing section is computed. Since the dimensionality of the turbulence model parameter \({\varvec{a}}\) is 3, the required number of CFD computations has been reduced compared to the 12-dimensional airfoil case. A minimum of 70 DoE sample points to construct a surrogate model at each flow condition is used. This varying number of DoE samples is caused from diverging CFD simulations which occur once a combination of closure coefficients has been selected which violate the underlying RANS modeling assumptions.
3.2.1 Inference using drag coefficient
First, turbulence model coefficients are inferred based on the drag coefficient \(C_{D}\). The dataset used consisting of the input \({\mathbf{X}}\) and the output \({\mathbf{Y}}\) at all the nine flow conditions as \({\mathbf{X}} = \left( {{\varvec{x}}^{{\left( {case1} \right)}} ,{\varvec{x}}^{{\left( {case2} \right)}} , \cdots ,{\varvec{x}}^{{\left( {case9} \right)}} } \right)\) and \({\mathbf{Y}} = \left( {{\varvec{y}}^{{\left( {case1} \right)}} ,{\varvec{y}}^{{\left( {case2} \right)}} , \cdots ,{\varvec{y}}^{{\left( {case9} \right)}} } \right)\), where \({\varvec{x}}\) and \({\varvec{y}}\) are the AoA and \(C_{D}\), respectively. The experimental data for \(C_{D}\) is shown in Fig. 9. As aforementioned, the deformation used for all computations is fixed and belongs to Case 4 which is the case of the second lowest angle of attack. Figure 9 also depicts the possible \(C_{D}\) region in the search space \({\varvec{a}} \in\)[0.5 \({\varvec{a}}_{org}\),1.5 \({\varvec{a}}_{org}\)]. This region is spanned by directly using all available DoE sample pointsand plotting their minimum and maximum values. It has to be noted that this coverage does not guarantee that a possible combination of parameters exists for each \(C_{D}\) value inside. Nevertheless, it can be seen that all experimental data points are within the covered region. The \(C_{D}\) characteristics using the original coefficients \({\varvec{a}}_{org}\) are also described in Fig. 9.
Results obtained by the point-estimate and the interval-estimate are shown in Fig. 10. Since the probabilistic model is exactly the same as that in the case of RAE2822 in the previous sub-section, the likelihood function used for the learning process is Eq. (10) as well. First of all, the results of the point-estimate are discussed. After the learning process, the parameters \({\varvec{a}}\) and \(\sigma\) in Eq. (10) are calibrated as \(\hat{\user2{a}}\) and \(\hat{\sigma }\). Figure 10 shows \(C_{D}\) characteristics over the nine angles of attack. The green line indicates the results computed by using the calibrated turbulence model parameter \(\hat{\user2{a}}\). At the angle of attack of Case 4, the predicted \(C_{D}\) by the calibrated parameter \(\hat{\user2{a}}\) (\(C_{D}^{calib}\)) corresponds well to the underlying data. For the other cases, \(C_{D}^{calib}\) is between the data \({\mathbf{Y}}\) and \(C_{D}^{org}\) at each angle of attack, which is the qualitative information obtained by the point-estimate. The difference between \(C_{D}^{calib}\) and \({\mathbf{Y}}\) is treated as the noise produced by a probability with the calibrated parameter \(\hat{\sigma }\). This noise represents all of the uncertainties except the uncertainty associated with the model parameter \({\varvec{a}}\). By following the model definition in sect. 2.2, the inaccuracy of the configuration due to aeroelasticity is also treated as the noise.
When following a stochastic approach, the marginal posterior distributions shown in the upper part of Fig. 10 have large uncertainties especially on \(a_{1}\) (= \(\sigma_{SA}\)) since it is fully distributed in the constrained range of [0.5 \({\varvec{a}}_{org}\), 1.5 \({\varvec{a}}_{org}\)]. The mode is around the lower bound. The noise parameter \(\sigma\) is also widely distributed and spreads around four times of the mode \(\hat{\sigma }\). The predictive distributions \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\) are depicted in the lower part of Fig. 10. It can be observed that the 95% CI at the highest AoA is quite wide compared to lower AoA. The predictive distribution at the second highest AoA is relatively narrow and the data point at this condition is not within the 95% CI. The noise parameter \(\sigma\) is actually estimated quite high even though the exact drag counts of \(\sigma\) cannot be described here. Therefore, the predictive distribution considering all the uncertainties computed by Eq. (13), should contain the data points. However, in this particular application case, there exists external noise due to the geometry difference which falsifies this assumption.
3.2.2 Inference using pressure coefficient
As described in the 2D airfoil case, at each flow condition, the amount of the information when using \(C_{p}\) distributions is much greater than when using \(C_{D}\). However, for practically relevant cases, \(C_{p}\) measurements are usually available only at several sections/locations. Figure 11 shows \(C_{p}\) distributions of the experimental data and feasible CFD solutions for Case 4. Three sections are available and each of them cover the whole surface of the section where \(C_{p}\) data is measured. namely Section \(i\) to Section \(iii\) ordered from wing root to tip. Since the angle of attack at Case 4 is negative, shock waves are generated on the lower surface of the wing.
As the assumption in the inference process introduced in Sect. 2.2, the turbulence model parameter \({\varvec{a}}\) is stochastic. It is neither a spatio-temporal parameter like a function of the time and space nor a function of flow conditions. In definition of probabilistic models, we introduce two derivations (A) and (B) from two hypotheses:
-
A.
the same noise parameter at each section as one \(\sigma\)
-
B.
different noise parameters at each section as \(\sigma^{\left( j \right)}\), where \(j\) = 1,2,3
The likelihood function in the hypothesis (A) is the same as Eq. (17) in the 2D RAE2822 test case as follows:
where \(x^{{\left( {i,j} \right)}}\) indicates in this case the spatial coordinate of \(i\) th point at the section \(j\), and \(N_{sec}\) indicates the number of the sections. Here we use the three sections, therefore \(N_{sec}\) = 3. A common noise parameter denoted by \(\sigma\) is set for all the sections.
On the other hand, the noise parameter of the likelihood function in the hypothesis B) can be expressed by a vector. Each element of the parameter vector consists of a different noise parameter \(\sigma^{\left( j \right)}\) at each section \(j\). The likelihood function is slightly modified from Eq. (18) as:
where \({\varvec{\sigma}} = \left( {\sigma^{\left( 1 \right)} ,\sigma^{\left( 2 \right)} , \ldots ,\sigma^{{\left( {N_{sec} } \right)}} } \right)\). The probabilistic models of the hypotheses (A) and (B) are summarized in Appendix 4 for further descriptions of the hypotheses behind.
The point-estimate and the interval-estimate are conducted on Eqs. (18) and (19) in order to obtain \(\hat{\user2{a}}\), \(\hat{\user2{\sigma }}\) (by the point-estimate), and \(p\left( {{\varvec{a}},{\varvec{\sigma}}} \right)\) (by the interval-estimate), where \(\hat{\user2{\sigma }} = \left( {\hat{\sigma }} \right)\) in Eq. (18), \(\hat{\user2{\sigma }} = \left( {\hat{\sigma }^{\left( 1 \right)} ,\hat{\sigma }^{\left( 2 \right)} , \cdots ,\hat{\sigma }^{{\left( {N_{sec} } \right)}} } \right)\) in Eq. (19), respectively.
Figure 12 shows the marginal posterior distributions and predictive distributions \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\) when a unique noise parameter is introduced. In the predictive distributions at all three sections, the epistemic uncertainties associated with the parameter \({\varvec{a}}\) (shown as 95% CI in the figure) are slightly wider around the shock locations and downstream thereof compared to more benign flow phenomena. However, they are so small that they do not cover the experimental data. In fact, the predictive distribution at section \(i\) even diverged further from the experimental data than the CFD prediction relying on the original model parameters \({\varvec{a}}_{org}\). These results are mainly caused by introducing the same noise parameter for all the sections.
On the hypothesis B using Eq. (19), a more flexible assumption is used in which noise at each section is expressed independently. Figure 13 shows the marginal posterior distributions and predictive distributions \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\). The overall characteristics are similar to those of the hypothesis (A). However, the predictive distribution covers the shock wave at Section \(i\) slightly better than hypothesis A). Hence, assuming different noise parameters for each section seems more promising than assuming a common noise parameter. However, the results in Fig. 13 also clearly indicate that there is no possibility for the turbulence model parameter to properly predict all flow phenomena at the three sections by adjusting just three turbulence model parameters and aiming for a global model description.
Based on the previous results, the flexibility is further increased by allowing different turbulence model parameters \({\varvec{a}}\) for each section as \({\varvec{a}}^{\left( j \right)}\), where \(j\) = 1,2,⋯,\({ }N_{sec}\) (named hypothesis (C)). The likelihood function at each section is identical to Eq. (10). Figure 14 shows three sets of marginal posterior distributions on the three sections and the respective predictive distributions \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}}} \right)} \right|_{{\varvec{a}}}\). It can be observed in Fig. 14 that the shock wave at each section can be better detected by independently inferred parameter \({\varvec{a}}^{\left( j \right)}\). As before, the uncertainties associated with the individual parameters \({\varvec{a}}\) are larger around the shock locations. This showcases that the developed framework is able to detect location of increased physical complexity, e.g. shocks, and highlights them by a larger local uncertainty.
A natural extension of the previous strategy is the assumption that the parameter is spatially varying in general. For example, Duraisamy et al. [7] proposed a model correction term and then inferred it employing a learning process. Since the spatially varying parameter has the same dimensionality than the underlying computational grid, computations of a posterior by MCMC are infeasible and they are achieved by the other approximation methods based on definition of posterior distributions such as the variational inference and the Laplace approximation (see Appendix 2).
4 Conclusions
A framework for data-driven turbulence modeling under uncertainties was developed and applied to a 2D airfoil and an industrially-relevant 3D full aircraft case. The objective of this framework is to quantify the uncertainty of the closure coefficients and that of the quantities of interest associated with the closure coefficients by employing a Bayesian approach. In this paper, the closure coefficients of the Spalart Allmaras turbulence model were inferred. The partial availability of information, which is common for cases of industrial relevance, is regarded as an epistemic uncertainty that is addressed by a fully Bayesian approach, with special treatment for computing the predictive distribution. The resulting uncertainties can provide an insight into one of the model uncertainties due to the closure coefficients for predicting the quantities of interest by distinguishing it from other external uncertainties. In this work, these other external uncertainties are treated as noise. The other possible epistemic uncertainties in the CFD simulations such as the grid convergence were eliminated as much as possible in advance though they are classified into the noise if they exist. The uncertainties associated by all the parameters, which are the closure coefficients and the noise parameter, in the fully Bayesian model were rigorously evaluated by a Markov-Chain Monte Carlo (MCMC) sampling technique. To overcome the bottleneck of high computational cost in the MCMC process associated with repetitive RANS evaluation, a surrogate-based methodology has been employed.
The 2D airfoil test case with artificial data has been used to validate the methodologies integrated in the framework. Moreover, dominant closure coefficients were selected by using Sobol indices as a preprocessing step that further increased the efficiency of the surrogate-based approximation. It was shown that known model coefficients can be rediscovered by relying on data. Furthermore, it was confirmed not only that the closure coefficients and the artificial noise were replicated by the developed method but also that the epistemic uncertainty associated with the closure coefficients are large around the shock locations. What is important in this replication test case is that the information of this epistemic uncertainty was not specified in the problem setting but was derived by the developed method.
For the 3D aircraft case the information obtained from the airfoil Sobol indices analysis has been reused to keep computational cost at a feasible level. Inference of coefficient by analyzing pressure coefficients showed that the epistemic uncertainty associated by the coefficients are highly detected in regions of sensitive flow, such as the shock position and downstream thereof. It was also shown that the uncertainty associated by the noise is quite large compared with that by the coefficients in this test case.
Obtained results, and in particular the epistemic uncertainties associated by the coefficients, can be used for various applications. For example, one option could be to feedback information to the experimental measurement strategies to determine at which locations on the wing new measurement devices should be placed. Another opportunity is the propagation of uncertainties towards unobservable flow conditions with the assistance of surrogate models and the posterior of the closure coefficients to enhance the modeling of e.g. loads at the edge of the flight envelope. The developed framework is flexible in terms of modelling, i.e. other closure models can be applied as well as other probabilistic models.
Abbreviations
- \({\varvec{x}}\) :
-
A vector-valued input variable
- \(y\) :
-
A scalar-valued output variable
- \({\mathbf{X}}\) :
-
A dataset of input variable \({\varvec{x}}\)
- \({\mathbf{Y}}\) :
-
A dataset of input variable \(y\)
- \({\varvec{a}}\) :
-
A parameter vector in general
- \(\sigma\) :
-
Standard deviation
- \({\varvec{\theta}}\) :
-
A hyperparameter vector in general
- \(f\) :
-
A function to produce the output value by CFD with a turbulence model
- \(\varepsilon\) :
-
A stochastic variable representing noise
- \(p\) :
-
A probability
- \(N\) :
-
Sample size of a dataset
- \({\mathcal{N}}\) :
-
Normal distribution
- \({\mathcal{D}}\) :
-
A dataset as \({\mathcal{D}} = \left( {{\mathbf{X}},{\mathbf{Y}}} \right)\)
- CFD:
-
Computational fluid dynamics
- RANS:
-
Reynolds-Averaged Navier–Stokes
- S-A:
-
Spalart–Allmaras
- UQ:
-
Uncertainty quantification
- QoI:
-
Quantities of interest
- MCMC:
-
Markov-chain Monte Carlo
- DoE:
-
Design of experiment
- POD:
-
Proper orthogonal decomposition
- CI:
-
Credible interval
- AoA:
-
Angle of attack
- 2D:
-
Two-dimensional
- 3D:
-
Three-dimensional
- \(C_{d}\) :
-
Drag coefficient of a 2D profile
- \(C_{D}\) :
-
Drag coefficient of a 3D profile
- \(C_{p}\) :
-
Pressure coefficient
- \(i\) :
-
\(i\)th element of a vector
- \(\left. {} \right|_{{\varvec{a}}}\) :
-
Predictive distribution contributed only by the parameter \({\varvec{a}}\)
- \(org\) :
-
Original value
- \(new\) :
-
A new value to distinguish with the others
- \(pri\) :
-
A prior distribution
- \(\left( i \right)\) :
-
\(i\)th sample point of a dataset
- \(\widehat{{~}}\) :
-
Fixed by a point-estimate approach for a stochastic variable
- \(\widetilde{{~}}\) :
-
Approximated by a surrogate model for a function
References
Brunton, S.L., Noack, B.R., Koumoutsakos, P.: Machine learning for fluid mechanics. Annu. Rev. Fluid Mech. 52, 477–508 (2020)
Tinoco, E.N., Brodersen, O.P., Keye, S., et al.: Summary of Data from the Sixth AIAA CFD Drag Prediction Workshop: CRM Cases 2 to 5. Journal of Aircraft (2018), vol. 55, no. 4 (2018)
Da Ronch, A., Panzeri, M., Drofelnik, J., et al.: Sensitivity and calibration of turbulence model in the presence of epistemic uncertainties. CEAS Aeronaut. J. 11, 33–47 (2020)
He, X., Zhao, F., Vahdati, M.: Machine Learning Uncertainty Quantification of Spalart-Allmaras Turbulence Model for Compressors. GPPS Paper No. 2019-BJ-0050 (2019)
Singh, A.P., Duraisamy, K.: Using field inversion to quantify functional errors in turbulence closures. Phys. Fluids 28(4), 045110 (2016)
Parish, E.J., Duraisamy, K.A.: Paradigm for data-driven predictive modeling using field inversion and machine learning. J. Comput. Phys. 305, 758–774 (2016)
Duraisamy, K., Singh, A.P., Pan, S.: Augmentation of turbulence models using field inversion and machine learning. In: Proceedings of 55th AIAA Aerospace Sciences Meeting, AIAA 2017–0993, (2017)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, New York (2007)
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (2004)
Ray, J., Lefantzi, S., Arunajatesan, S. et al.: Bayesian calibration of a k-ε turbulence model for predictive jet-in-crossflow simulations. In: Proceedings of 44th AIAA Fluid Dynamics Conference, AIAA 2014–2085 (2014)
Subbian, G., Botelho e Souza, A.C., Radespiel, R., et al.: Calibration of an extended eddy viscosity turbulence model using uncertainty quantification. In: Proceedings of AIAA Scitech 2020 Forum, AIAA 2020–1031 (2020)
Edeling, W.N., Cinnella, P., Dwight, R.P., et al.: Bayesian estimates of parameter variability in the k-ε turbulence model. J. Comput. Phys. 258, 73–94 (2014)
Schaefer, J., Hosder, S., West, T., et al.: Uncertainty quantification of turbulence model closure coefficients for transonic wall-bounded flows. AIAA J. 55(1), 195–213 (2017)
Xiao, H., Cinnella, P.: Quantification of model uncertainty in RANS simulations: a review. Prog. Aerosp. Sci. 108, 1–31 (2019)
Galle, M., Gerhold, T., Evans, J.: Parallel computation of turbulent flows around complex geometries on hybrid grids with the DLR-TAU code. In: A. Ecer, D.R. Emerson (Eds.), Proc. 11th Parallel CFD Conf. (1999)
Gerhold, T., Hannemann, V., Schwamborn, D.: On the validation of the DLR-TAU code. In: W. Nitsche, H.-J. Heinemann, R. Hilbig (Eds.), New Results in Numerical and Experimental Fluid Mechanics, Notes on Numerical Fluid Mechanics, vol. 72, Vieweg, ISBN 3-528-03122-0, pp. 426–433 (1999)
Schwamborn, D., Gerhold, T., Heinrich, R.: The DLR TAU-code: Recent applications in research and industry, invited lecture. In: Wesseling, P., Oate, E., Priaux, J (Eds.), Proceedings of the European Conference on Computational Fluid Dynamics (ECCOMAS CFD 2006) (2006)
Allmaras, S.R., Johnson, F.T., Spalart, P.R.: Modifications and clarifications for the implementation of the Spalart-Allmaras turbulence model. In: Seventh International Conference on Computational Fluid Dynamics (ICCFD7), ICCFD7-1902 (2012)
Kennedy, M.C., O’Hagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B 63(3), 425–464 (2001)
Bekemeyer, P., Matteo, R., Heinrich, R., et al.: Nonlinear unsteady reduced-order modeling for gust-load predictions. AIAA J. 57(5), 1839–1850 (2019)
Stradtner, M., Liersch, C.M., Bekemeyer, P.: An aerodynamic variable-fidelity modelling framework for a low-observable UCAV. Aerospace Sci. Technol. 107, 106232 (2020)
Sobol, I.M.: Distribution of points in a cube and approximate evaluation of integrals. Zh. Vychisl. Mat. Mat. Fiz. 7(4), 784–802 (1967)
Joe, S., Kuo, F.Y.: Remark on Algorithm 659: implementing Sobol’s quasirandom sequence generator. ACM Trans. Math. Softw. 29, 49–57 (2003)
Herman, J., Usher, W.: An open-source Python library for sensitivity analysis. J. Open Source Softw. 2(9), 97 (2017). https://doi.org/10.21105/joss.00097
Iooss, B., Lemaître, P.A.: Review on Global Sensitivity Analysis Methods. Uncertainty Management in Simulation-Optimization of Complex Systems, pp. 101–112. Springer, New York (2004)
Papadimitriou, D.I., Papadimitriou, C.: Bayesian uncertainty quantification of turbulence models based on high-order adjoint. Comput. Fluids 120, 82–97 (2015)
Acknowledgements
The DLR authors would like to thank Airbus for providing the XRF1 testcase as a mechanism for demonstration of the approaches presented in this paper. The first author would like to thank to Bernhard Eisfeld and Vamshi Togiti at DLR for discussion and assistances on the DLR-TAU code and for valuable comments, and expert knowledge on the turbulence models. The first author would also like to thank to Jan Himisch, Mohammed Abu-Zurayk, Caslav Ilic, Andrei Merle at DLR, and Thomas Engelbrecht, Cross Murray, Paul White and Alan Mann at Airbus for data offering and valuable discussion on the XRF1 test case.
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Executive Editor: Bernd Noack.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Supplementation of the omitted explanations of the methodologies and details of the theory in background in Sect. 2.2 and Sect. 3 are described here.
1.1 1. The function to be evaluated by optimizer/MCMC
In Sect. 2.2, it is explained that a likelihood function is evaluated by MCMC. Accurately to say, it has to be a posterior function by the Bayes’ theorem. It needs to be transformed into a form of a probability in terms of the parameters \({\varvec{a}}\) and \(\sigma\) as from \(p\left( {y{|}{\varvec{x}},{\varvec{a}},\sigma } \right)\) to \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\). This is the transformation from a likelihood \(p\left( {{\mathbf{Y}}{|}{\mathbf{X}},{\varvec{a}},\sigma } \right)\) to a posterior \(p\left( {{\varvec{a}},\sigma {|}{\mathbf{X}},{\mathbf{Y}}} \right)\) and can be achieved by the Bayes’ theorem as:
where \(p\left( {{\varvec{a}},\sigma } \right)\) is a joint prior of the parameters \({\varvec{a}}\), \(\sigma\) that can be defined arbitrarily. In practice, we can set independent priors for each element of the parameters, e.g. \(p\left( {{\varvec{a}},\sigma } \right)\) = \(p\left( {\varvec{a}} \right)p\left( \sigma \right)\) since \({\mathbf{Y}}\) is not observed yet. \(p\left( {\mathbf{Y}} \right)\) is the denominator in Bayes’ theorem as being the normalization constant required to ensure the posterior to be a probability density. This normalization constant can be represented by an integral of the numerator in the right-hand side but is normally incomputable. However, the formulation needed to be evaluated is the posterior by Eq. (20) in theory.
However, in practice, the likelihood function \(p\left( {{\mathbf{Y}}{|}{\mathbf{X}},{\varvec{a}},\sigma } \right)\) or the product of that and a prior \(p\left( {{\varvec{a}},\sigma } \right)\) is sufficient for the formulation to be evaluated when using optimizer/MCMC. Therefore, the formulation to be evaluated is simply the product of the likelihood and a prior:
When no specific prior wants to be assumed (purely evaluated only by currently observed data), improper prior such as \(p\left( {{\varvec{a}},\sigma } \right)\) = 1 to be \(p\left( {{\mathbf{Y}}{|}{\mathbf{X}},{\varvec{a}},\sigma } \right)p\left( {{\varvec{a}},\sigma } \right)\) = \(p\left( {{\mathbf{Y}}{|}{\mathbf{X}},{\varvec{a}},\sigma } \right)\) can be used. It is rational when no nonlinear transformation of the parameters and the normalize constant can be obtained [8]. Eventually the function to be evaluated by optimizer/MCMC becomes the likelihood function itself. Note that like a likelihood function in general, the improper prior either does not guarantee the integral of it to be 1 over the parameters \({\varvec{a}},\sigma\). The improper prior can be applicable since the normalization constant can be computed by the MCMC and since no nonlinear mapping of the parameter \({\varvec{a}}\) is operated in the whole procedure. Constraints of the parameter \({\varvec{a}}\) (e.g. \({\varvec{a}}\) ∈ [0.5 \({\varvec{a}}_{org}\),1.5 \({\varvec{a}}_{org}\)] used in the main context) can be also treated as prior information. The prior function of these constraints is a step function extended from the improper prior as:
Therefore, all of the likelihood functions (Eqs. (10), (17–19)) introduced in the main context is actually:
where \(p\left( {{\varvec{a}},\sigma } \right)\) = \(p\left( {\varvec{a}} \right)p\left( \sigma \right)\), and \(p\left( \sigma \right)\) = 1 as the improper prior. This expression is totally omitted in the main context to avoid complexity. In practice the posterior to be a probability that the integral of posterior as a probability density function is guaranteed to be 1 by using the MCMC.
1.2 2. How to compute the predictive distributions
First of all, it has to be reminded that cases where the regression model defined by Eq. (7) is a linear regression model, and the probabilistic model defined by Eq. (9) is a Gaussian distribution, can analytically represent the predictive distribution in general [8]. The posterior can be analytically obtained by assist of a conjugate prior, which can share the same probability distribution between the prior and the posterior, and leads to efficient update of the posterior. These cases are often used in practical applications. In fact, some properties in this process to obtain analytical solutions are applied to the approximation methods for their efficiency. For example, one of the most widely used analytical representations is the case where the noise parameter (of a Gaussian probabilistic model) is deterministic and the prior is a Gaussian distribution. This modeling can be categorized in one of the empirical Bayes approaches shown in Appendix 3. In this case, the posterior becomes also a Gaussian distribution. This approach would be rather used for rapid approximations in iterative processes. On the other hand, when the noise parameter is treated as stochastic such as the case in this ariticle, the conjugate prior is not a Gaussian distribution but a Gauss-Gamma distribution and more generally a Gaussian-Wishart distribution for multiple input parameters.
In our applications, since the regression model is a nonlinear regression model, the predictive distribution by Eq. (13) does not have analytical solutions even in the empirical Bayesian approaches (see Appendix 3). As an example of other cases, the neural networks in general can be also regarded as nonlinear regression models in general, so as it to the same.
There are several approximation methods in computing the predictive distributions [8]
-
MCMC,
-
Laplace approximation,
-
variational inference.
The first one is a method with a help of the MCMC. The integral in Eq. (13) is replaced by the samples. The aim of the latter two methods is basically to approximate the posterior distribution by a parametric distribution. These can be rather used in terms of efficiency or intractability due to high dimensionality of the parameter to be inferred. These require other alternative computations such as optimization (used for the point-estimate) and Hessian predictions of the regression model \(f\left( {{\varvec{x}},{\varvec{a}}} \right)\), the second derivatives with respect to \({\varvec{a}}\). If computational resources are sufficient, the MCMC strategy is expected to be more accurate in general than the others.
When computing the predictive distributions with all mixed uncertainties represented by Eq. (13), the computation is a mixture of Gaussian distributions, which consists of equally weighted \(N_{mcmc}\) Gaussian distributions, each of which is expressed by a known mean and variance as:
The predictive distribution is not guaranteed to be a Gaussian distribution but its mean and variance can be analytically obtained by the generated MCMC samples in hand whereas it is computationally expensive to estimate the distribution itself by nonparametric approaches.
Thus, the predictive distribution at arbitrary output value \(y\) represented by Eq. (13) can be computed by the mixture of the distributions \(p\left( {y{|}{\varvec{x}},\left( {{\varvec{a}},\sigma } \right)^{\left( i \right)} } \right)\), where \(i\) = 1,2,⋯,\(N_{mcmc}\). Each distribution \(p\left( {y{|}{\varvec{x}},\left( {{\varvec{a}},\sigma } \right)^{\left( i \right)} } \right)\) is properly weighted by dividing the sum of them by the MCMC sample size \(N_{mcmc}\). However, although any complex distributions can be represented by the MCMC sample points, the probability \(p\left( y \right)\) is not parametric. Therefore, the probability needs to be computed at arbitrary \(y\). The direct computation of that is expensive so that approximation methods are recommended to be used.
An approximation example when the empirical Bayes approach is described here. The empirical Bayes approach is to treat the hyperparameter (the noise parameter \(\sigma\) in the main context) as deterministic (e.g. as \(\hat{\sigma }\)). The mixture of Gaussian expressed by Eq. (24) is adjusted to as follows in the empirical Bayes approach:
where \(\hat{\sigma }\) is a fixed noise parameter. The concept of this approximation is the same as the Laplace approximation and the variational inference shown in the above. It is to approximate the posterior distribution \(p\left( {\varvec{a}} \right)\) as a Gaussian distribution. Then only two parameters are needed to completely describe the distribution. Therefore, the mean \({\varvec{\mu}}_{{{\varvec{a}},\sigma }}\) and variance \(\sigma_{{\varvec{a}}}^{2}\) are computed from the MCMC samples describing \(p\left( {\varvec{a}} \right)\). The variance \(\sigma_{{\varvec{a}}}^{2}\) is used later to approximate the predictive distribution.
Here let us take the analytical form of the predictive distribution of Eq. (13) when the regression model defined in Eq. (7) is a linear regression and the probabilistic model defined in Eq. (9) is a Gaussian distribution.
where \(\sigma_{pre}^{2}\) is the variance of the predictive distribution. For example, when the regression model is a linear regression as \(f\left( {{\varvec{x}},\hat{\user2{a}}} \right) = \hat{\user2{a}}^{{\text{T}}} \user2{\varphi }\left( {\varvec{x}} \right)\) can be analytically solved as the analytical solution of the least-squares as \(\hat{\user2{a}} = \left( {{{\varvec{\Phi}}}^{{\text{T}}} {{\varvec{\Phi}}}} \right)^{ - 1} {{\varvec{\Phi}}}^{{\text{T}}} {\mathbf{Y}}\) where \({{\varvec{\Phi}}}\) is a matrix whose elements consist of \(\varphi_{j} \left( {{\varvec{x}}_{i} } \right)\) (so-called design matrix).
The variance \(\sigma_{pre}^{2}\) is now easily computed by the following two facts. One is that the two distributions, the probabilistic model and the posterior distribution, both of which compose the predictive distribution, are independent of each other. This is actually general characteristics of predictive distributions. The other one is that both of the two distributions are now Gaussian distributions. Based on these two facts, the variance \(\sigma_{pre}^{2}\) is computed simply the sum of the variance of each distribution as:
where \(\hat{\sigma }^{2}\) is the variance of the probabilistic model the noise defined by Eq. (9) [and it was obtained by Eq. (11)]. \(\sigma_{{\varvec{a}}}^{2}\) is the variance of the posterior distribution. It is regarded as the uncertainty of the model parameter \({\varvec{a}}\). Both \(\hat{\sigma }^{2}\) and \(\sigma_{{\varvec{a}}}^{2}\) are the variances obtained after the learning process. The mixed predictive distributions stemmed from all the uncertainties, which are not valued as important in this article, can be then efficiently computed.
1.3 3. Comparison with the empirical Bayes approach and the predictive distribution composed of all the mixed uncertainties
The objective of this paper is UQ of QoIs derived from the turbulence closure coefficients \({\varvec{a}}\) taking the uncertainty of the noise into account. This modeling can cover all the uncertainties that are present under the probabilistic model on our definition. Here influences of the modeling that the noise parameter is also treated as stochastic are demonstrated by comparing the results of the empirical Bayes approach in which the noise parameter is treated as deterministic as \(\hat{\sigma }\) (e.g. obtained by Eq. (11)). The Eqs. (13) and (15a) are adjusted to Eq. (25) and the following equation, respectively:
The only difference between Eqs. (15a) and (28) is the posterior distributions \(p\left( {{\varvec{a}}{|}{\mathbf{X}},{\mathbf{Y}},\hat{\sigma }} \right)\) in Eq. (28) and \(p\left( {{\varvec{a}}{|}{\mathbf{X}},{\mathbf{Y}}} \right)\) in Eq. (15a), whose distributions are computed by MCMC. The difference between these posterior distributions can be shown in the next paragraph with Fig. 15. The predictive distribution \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}},\hat{\sigma }} \right)\) by Eq. (25) reflects the contribution of all the uncertainties (see Fig. 16 as an example). The results of reflecting all the uncertainties are shown in the paragraph after the next.
The difference of the predictive distributions between the full Bayes approach and the empirical Bayes approach are expected to be remarkable when the posterior distribution of \(\sigma\) is wide, i.e. when a large uncertainty on \(\sigma\). Figure 15 shows marginal posterior distributions and predictive distributions for the inference using drag coefficient, on comparison of Fig. 10, which has relative wide distribution in terms of \(\sigma\). Overall the uncertainties of the marginal posterior distributions (of \({\varvec{a}}\)) and the accompanying predictive distributions are estimated to be smaller than the cases when the full Bayes approach in the main context is used. Note that the mode \(\hat{\user2{a}}\) is common for the both approaches since it can be evaluated just by the point-estimate.
Next, an example of the predictive distribution \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}},\hat{\sigma }} \right)\) when the contribution of all the uncertainties is accounted is presented. Figure 16 shows comparison between the predictive distribution \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}},\hat{\sigma }} \right)} \right|_{{\varvec{a}}}\) computed by Eq. (28) and the predictive distribution \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}},\hat{\sigma }} \right)\) computed by Eqs. (25–27)). The approach commonly used here is the empirical Bayes. Therefore, the noise parameter \(\sigma\) is fixed at \(\hat{\sigma }\). The result obtained by the fully Bayesian approach is shown in Fig. 14-Section \(ii\) in the main context. It can be interesting to compare Fig. 16a and Fig. 14-Section \(ii\) like comparison between Figs. 10 and 15 for \(C_{D}\), which shows that there is fewer differences in the cases of \(C_{p}\) than those of the case of \(C_{D}\). However, the main focus here is to confirm the differences of the uncertainties between \(\left. {p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}},\hat{\sigma }} \right)} \right|_{{\varvec{a}}}\) and \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}},\hat{\sigma }} \right)\). The predictive distribution \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}},\hat{\sigma }} \right)\) (see 95% CI in the figures) is not useful anymore to observe the uncertainty associated with the parameter \({\varvec{a}}\) which is mixed to be lost in \(p\left( {y{|}{\varvec{x}}, {\mathbf{X}},{\mathbf{Y}},\hat{\sigma }} \right)\).
1.4 4. Probabilistic models
The probabilistic models of hypotheses (A), (B) and (C) in Sect. 3.2, i.e. how the likelihoods expressed by Eqs. (17), (18) and (19) are originally derived, are described here. The probabilistic model of hypothesis (A) is equivalent to Eq. (9) as mentioned in the main context. The probabilistic model of hypothesis (B) is expressed by a multivariate Gaussian distribution, whose covariance is a known structure but parametrized by \({\varvec{\sigma}} = \left( {\sigma^{\left( 1 \right)} ,\sigma^{\left( 2 \right)} , \cdots ,\sigma^{{\left( {N_{sec} } \right)}} } \right)\), with \(N_{sec}\) = 3.
\(\begin{aligned}&{\text{where}},\ {\mathbf{f}}.\left( {{\mathbf{X}},{\varvec{a}}} \right) = \left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} {f\left( {{\varvec{x}}^{{\left( {1,1} \right)}} ,{\varvec{a}}} \right)} \\ \vdots \\ {f\left( {{\varvec{x}}^{{\left( {N_{1} ,1} \right)}} ,{\varvec{a}}} \right)} \\ \end{array} } \\ {\begin{array}{*{20}c} {f\left( {{\varvec{x}}^{{\left( {1,2} \right)}} ,{\varvec{a}}} \right)} \\ \vdots \\ {f\left( {{\varvec{x}}^{{\left( {N_{2} ,2} \right)}} ,{\varvec{a}}} \right)} \\ \end{array} } \\ {\begin{array}{*{20}c} {f\left( {{\varvec{x}}^{{\left( {1,3} \right)}} ,{\varvec{a}}} \right)} \\ \vdots \\ {f\left( {{\varvec{x}}^{{\left( {3,3} \right)}} ,{\varvec{a}}} \right)} \\ \end{array} } \\ \end{array} } \right) \hfill \\ &\Sigma = \left( {\begin{array}{*{20}l} {\begin{array}{*{20}l} {\sigma^{{\left( 1 \right)^{2} }} } \hfill & {} \hfill & {} \hfill \\ {} \hfill & \ddots \hfill & {} \hfill \\ {} \hfill & {} \hfill & {\sigma^{{\left( 1 \right)^{2} }} } \hfill \\ \end{array} } \hfill & {} \hfill & 0 \hfill \\ {} \hfill & {\begin{array}{*{20}l} {\sigma^{{\left( 2 \right)^{2} }} } \hfill & {} \hfill & {} \hfill \\ {} \hfill & \ddots \hfill & {} \hfill \\ {} \hfill & {} \hfill & {\sigma^{{\left( 2 \right)^{2} }} } \hfill \\ \end{array} } \hfill & {} \hfill \\ 0 \hfill & {} \hfill & {\begin{array}{*{20}l} {\sigma^{{\left( 3 \right)^{2} }} } \hfill & {} \hfill & {} \hfill \\ {} \hfill & \ddots \hfill & {} \hfill \\ {} \hfill & {} \hfill & {\sigma^{{\left( 3 \right)^{2} }} } \hfill \\ \end{array} } \hfill \\ \end{array} } \right),\end{aligned}\) where \({\mathbf{f}}.\) means a vector where the common function \(f\) is applied to each sample point of the data \({\mathbf{X}}\).The model can be also regarded as a mixture of Gaussians with fixed weighted parameter. The dataset \(\left( {{\mathbf{X}},{\mathbf{Y}}} \right)\) contains the data points from all the sections in order aligned to the covariance matrix \({{\varvec{\Sigma}}}\), e.g. \({\mathbf{X}} = \left( {{\varvec{x}}^{{\left( {1,1} \right)}} , \ldots ,{\varvec{x}}^{{\left( {N_{1} ,1} \right)}} ,{\varvec{x}}^{{\left( {1,2} \right)}} , \ldots ,{\varvec{x}}^{{\left( {N_{2} ,2} \right)}} ,{\varvec{x}}^{{\left( {1,3} \right)}} , \ldots ,{\varvec{x}}^{{\left( {N_{3} ,3} \right)}} } \right)\).
The probabilistic model of hypothesis (C) is essentially the same as hypothesis (A), but set to each section independently.
1.5 5. Error functions (simplified likelihood/posterior functions) used for the point-estimate
We introduced globally two derivations in the entire process. One is the point-estimate. The other one is the interval-estimate. Both of them were described in a Bayesian perspective in the main context.
As noted in Sect. 2.2, error functions can be used in the point-estimate using optimizers. Computing the mode of the posterior using MCMC is also the point-estimate but it cannot be applied in general to the error functions (the simplified likelihood/posterior functions). In practical applications, these error functions were used for the sake of computational effort. This causes reduction of the computational resource not only by its simplicity of the function itself but also by dimensionality reduction of the parameter search space in some cases.
Here the simplified equations actually used for Eq. (10), Eqs. (17–19), and the posterior using Eq. (16) are introduced. First, it is well-known that the error function of Eq. (10) when using Eq. (9) as its probabilistic model leads to the ordinary least squares. The main procedure is first to take its logarithm and then to consider the derivation of the log equation with respect to the parameters to be inferred to equal to 0. Constant terms and scaling parameters can be omitted in the point-estimate. In the ordinary least-squares, \({\varvec{a}}\) can be optimized independently of \(\sigma\). Therefore \(\hat{\sigma }\) is obtained by using the tuned result \(\hat{\user2{a}}\). Equations (17) and (19) lead to the ordinary least squares as well since the likelihood functions of them are Eq. (10). In the same manner when a Gaussian prior is used, the error function using Eq. (16) becomes the ordinary least-squares with the L2 norm regularization (so-called Ridge regression) which is shown later.
In the same manner, the following error function was used for the optimization in Eq. (18) in the hypothesis (B):
In this case, the variance vector does not disappear as a scaling constant in the function. The important property is that the optimization problem defined by Eq. (30) is an ill-posed problem [8]. This problem is brought when the model is a mixture model. When \(\sigma^{\left( j \right)} \to 0\), the cost function diverges to the infinity. The Bayesian approach using the MCMC might be more stable to obtain the mode.
As a final example, Eq. (16), which is the case when prior information is used is considered. Two cases where the noise parameter \(\sigma\) is fixed or not are shown here. The case where the parameter \(\sigma\) is fixed in advance (denoted as \(\hat{\sigma }\)) corresponds to the ordinary least-squares with L2 norm regularization (so-called ridge regression). Literally this is the MAP since the process itself is maximizing the posterior function. By taking the logarithm and derivatives to be 0:
where \(\lambda^{\left( j \right)}\) can be a regularization parameter defined as a user input parameter instead of defining \(\sigma_{pri}^{\left( j \right)}\). \(M\) is the dimensionality of the model parameter \({\varvec{a}}\). e.g. \(M\) = 12 when using the original turbulence model parameter. Equation (31) is a kind of ridge regressions. Therefore, the point-estimate using a Gaussian prior distribution in Sect. 3.1 is a ridge regression itself. It is also well-known that the equation corresponding to Eq. (31) becomes a lasso regression when a Laplace distribution is used for the prior \(p\left( {\varvec{a}} \right)\) in Eq. (16).
Our case is to take into account the case where both the model parameter and the noise parameter are learned by data. The error function is represented by the following formulation to execute an optimization in the parameter space of (\({\varvec{a}},\sigma\)):
Since the noise parameter is a scalar, ordinary optimization process is expected to work better than the case of Eq. (30) because of the dimensionality of the search space.
1.6 6. Generalized perspective of statistical modelling
The statistical modelling described in Sect. 2 can be summarized by a generalized perspective which can be more practical when the probabilistic model that we define is complicated. The process of the modelling is then generalized with assistance of the graphical model (the Bayesian network) as follows:
-
1.
describe a graphical model (a Bayesian network),
-
2.
describe the joint probability distribution of all the stochastic variables as \(p\left( {y,{\mathbf{Y}}} \right)\),
-
3.
describe the conditional probability distribution of the prediction output conditioned by the observed variables as \(p\left( {y{|}{\mathbf{Y}}} \right)\) with marginalizing all the latent variables,
where \({\mathbf{Y}}\) and \(y\) specify a dataset and a (arbitrary) prediction of the output, respectively. The graphical model in the step 1 is described by your defined probabilistic model. Figure 17 shows the graphical model of the fully Bayesian approach defined in Sect. 2. Please refer some books [8] to follow the rules of describing graphical models.
In the second step, the joint probability distribution composed of all the stochastic variables (\(y,{\mathbf{Y}},{\varvec{a}},\sigma\)) according to the graphical model:
Deterministic hyperparameters on the independent priors \(p\left( {\varvec{a}} \right)\) and \(p\left( \sigma \right)\) are omitted for simplicity (e.g. \(p\left( {{\varvec{a}}{|}\alpha } \right)\) and \(p\left( {\sigma {|}\beta } \right)\)).
The third step is composed by two small steps. As indicated in the graphical model, the stochastic variable \({\mathbf{Y}}\) is an observed variable. Therefore, the joint distribution by Eq. (33) is modified as conditioned by \({\mathbf{Y}}\) as follows by using the product rule:
Here, the Bayes’ theorem and the property of head-to-head among \({\varvec{a}},\sigma\) and \({\mathbf{Y}}\) as an observed variable visualized in the graphical model are used for the deformation of the formula, resulting in a joint probability of the posterior distribution. All the latent variables (here they are \({\varvec{a}}\) and \(\sigma\)) in Eq. (34) are finally marginalized (integrated out) by using the sum rule of the probability:
Equation (35) is exactly equal to Eq. (13).
Thus, by specifying the stochastic variables in the model, and by classifying them into the observed variables and the latent variables, a general process of obtaining the predictive distribution is introduced in any complex models with visualization assist by graphical modeling and useful properties equipped in them.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Maruyama, D., Bekemeyer, P., Görtz, S. et al. Data-driven Bayesian inference of turbulence model closure coefficients incorporating epistemic uncertainty. Acta Mech. Sin. 37, 1812–1838 (2021). https://doi.org/10.1007/s10409-021-01152-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10409-021-01152-5