1 Introduction

LES is a powerful tool for simulating a wide range of flows including turbulent and reacting flows. Although LES is more expensive than Reynolds Averaged Navier Stokes (RANS) simulations, with the rapid advances of fast and efficient computer hardware and scalable but also readily available software, LES is increasingly being used in a wide range of industries (aerospace, automotive, energy, chemical) for modelling fluid flows in complex and often realistic-size geometries (Gicquel et al. 2012; Pitsch 2006). In comparison to Direct Numerical Simulations (DNS) where all length and time-scales are resolved, LES reduces the computational load substantially by resolving only the largest scales.

LES comes in two main flavours: implicit and explicit (Gicquel et al. 2012; Sagaut 2001). In implicit LES, the filtering is essentially done through the numerical scheme whereby the goal is to obtain steady or at least bounded solutions for a given mesh size/time-step. In explicit LES, a spatial filter having a width \(\Delta \) is applied to the governing equations, and unresolved terms appearing in the resulting equation set are modelled explicitly. This is done either by developing suitable algebraic functions involving the resolved variables on the mesh, and/or by developing and solving suitable transport equations. In the majority of classic approaches the mesh spacing h to filter ratio \(h/ \Delta =1\) but this need not necessarily be the case as we discuss later on. Each of these two approaches has its merits and drawbacks and in this chapter we focus on explicit LES which solves the filtered equations. The filtered compressible momentum equation reads,

$$\begin{aligned} \frac{\partial \overline{\rho }\tilde{u}_i}{\partial t} + \frac{\partial \overline{\rho } \tilde{u}_i \tilde{u}_j}{\partial x_j} = - \frac{\partial \overline{p}}{\partial x_i} + \frac{\partial \tau _{ij}^r}{\partial x_j} - \frac{\partial \tau _{ij}}{\partial x_j}, \end{aligned}$$
(1)

where the overbar denotes spatial filtering using a suitable filter i.e.

$$\begin{aligned} \overline{\phi }( {x},t)=\int _{-\infty }^{\infty }G( {x}- {x}';\Delta )\phi ( {x}')d {x}', \end{aligned}$$
(2)

where G is the LES filter and \(\phi \) the quantity being filtered. Note that \(\tilde{}\) denotes Favre-filtering i.e. \(\tilde{\phi }=\overline{\rho \phi }/ \bar{\rho }\). The resolved and unresolved stress tensors \(\tau _{ij}^r\) and \(\tau _{ij}\) are given by,

$$\begin{aligned} \tau _{ij}^r=\overline{\mu \left( \frac{\partial u_i}{\partial x_j} + \frac{\partial u_j}{\partial x_i} \right) } - \frac{2}{3}\delta _{ij}\overline{\mu \frac{\partial u_k}{\partial x_k}}, \end{aligned}$$
(3)

and

$$\begin{aligned} \tau _{ij}=\bar{\rho }(\widetilde{u_i u_j} - \tilde{u}_i \tilde{u}_j), \end{aligned}$$
(4)

respectively. The resolved stress tensor is typically closed using the gradients of the filtered velocity components (hence called resolved, not because it is actually resolved but because the approximation below is such a good one),

$$\begin{aligned} \tau _{ij}^r\simeq \bar{\mu }\left( \frac{\partial \tilde{u}_i}{\partial x_j} + \frac{\partial \tilde{u}_j}{\partial x_i} - \frac{2}{3}\delta _{ij}\frac{\partial \tilde{u}_k}{\partial x_k} \right) =2\bar{\mu }\left( \tilde{S}_{ij}-\frac{1}{3}\delta _{ij}\tilde{S}_{kk}\right) , \end{aligned}$$
(5)

where

$$\begin{aligned} \tilde{S}_{ij}=\frac{1}{2}\left( \frac{\partial \tilde{u}_i}{\partial x_j}+\frac{\partial \tilde{u}_j}{\partial x_i}\right) , \end{aligned}$$
(6)

is the (resolved) rate of strain tensor. Clearly \(\tau _{ij}\) is an unclosed term and requires modelling in order to produce a closed equation set. This term is very important since it determines the dissipation/back-scatter of kinetic energy (Sagaut 2001)-multiplying Eq. 1 with \(\tilde{u}_i\) and summing it is straightforward to show that the contribution of the unresolved stress tensor to the resolved total kinetic energy \(e_r=1/2 \tilde{u}_i \tilde{u}_i\) is \(-\tilde{u}_i \partial {\tau _{ij}} / \partial {x_j}\).

A large number of different models have been developed in the literature throughout the years for \(\tau _{ij}\) aimed mainly at incompressible and non-reacting flows (Meneveau and Katz 2000). In the classic modelling approach, the stress tensor is modelled by developing suitable algebraic functions of the resolved quantities. In incompressible flows for instance, these include the filtered velocity components \(\bar{u}_i\) as well as any other derived quantities such as their gradients and/or functions of their gradients, higher-order filtered values of the aforementioned quantities etc. The majority of these models are relatively straightforward to implement while the computational cost depends on the formulation: the dynamic evaluation of model parameters can be substantially more expensive than the static approach (where a constant value for a certain parameter is assumed). A common characteristic of all of the aforementioned models is that they usually involve some simplifying assumption in their development which may or may not be valid for conditions other than those originally developed for. For example, the Boussinesq assumption is a rather strong one (Schmitt 2007). Previous theoretical as well as experimental work showed that this assumption is invalid both for non-reacting (Tao et al. 2000, 2002) and reacting flows (Klein et al. 2015; Pfandler et al. 2010). Another issue with classic algebraic models is that they involve tunable parameters whose spatio-temporal variation depends on the flow regime and/or reaction mode. As a result, a single universal method for accurate parameterisation/regularisation of the models’ constants is difficult to obtain.

Despite the aforementioned issues, the standard approach in reacting LES is to employ models originally developed and validated for incompressible and non-reacting flows. Reacting flows, however, bring additional challenges. The heat release causes large variations in density, temperature, velocity, and viscosity across the flame-front. All of these quantities affect the modelling of the stress tensor. Models developed for non-reacting and incompressible flows do not account for such effects. For instance, it was shown in Klein et al. (2015) as well as in previous theoretical and experimental studies (Bray et al. 1981; Chomiak and Nisbet 1995) that even for simple flow configurations such as freely-propagating premixed flames classic models are inadequate. In particular, it was shown (Klein et al. 2015) that counter-gradient transport also occurs for the components of the stress tensor, and as a result classic static gradient-type models cannot capture counter-gradient transport. Even dynamic models where the sign of the dynamic parameter can in principle change, fail to capture counter-gradient transport (Klein et al. 2015). In addition, it was shown in Klein et al. (2015) that the standard averaging procedure for regularising the dynamic parameters e.g. \(C_D\) in the Smagorinsky model is not suitable for reacting flows. The behaviour and performance of these models for more demanding configurations such as shear-induced flows with a larger spatial in-homogeneity is unclear, and the deficiencies of such models can only be unveiled through further investigation using both a priori as well as a posteriori studies. All of these issues essentially limit the predictive ability of LES to conditions where the models for the unresolved terms are known to perform well.

In light of the aforementioned long-standing issues, in the past few years a wide range of alternative non-classic modelling strategies have been proposed and evaluated (Domingo et al. 2020) including machine-learning which has the potential to circumvent such issues. Data-driven methods which include a wide range of network architectures have been widely used to solve classification and regression problems in image recognition (Krizhevsky et al. 2012), text translation (Sutskever et al. 2014), decision making (Mnih et al. 2015; Silver et al. 2016), gene profiling (Khan et al. 2001) etc. by directly exploiting the abundance of information contained within very large data sets. In the field of fluid mechanics databases are also quite substantial-DNS databases of non-reacting flows for instance are of the order of petabytes (Kanov et al. 2015). In reacting flows, simulations using DNS with detailed chemistry and multi-step reduced chemistry are slowly yet steadily becoming more common (Aspden et al. 2016; Minamoto et al. 2011; Nikolaou and Swaminathan 2014, 2015; Wang et al. 2017) while numerical solvers are being developed for DNS aimed at the exascale (Treichler et al. 2017) and exploiting hybrid architectures (Perez et al. 2018). As a result, the application of machine-learning techniques using data from such high-fidelity simulations for modelling purposes in LES appears to be a timely one.

In the text which follows we present in Sect. 2 some fundamental/popular models in the literature which have been the subject of recent and extensive testing in reacting flows (Nikolaou et al. 2019, 2021). In Sect. 3 another emerging approach namely deconvolution is discussed, and in Sect. 4 a review of the main approaches used for machine-learning is given. The main challenges and caveats associated with machine-learning methods are summarised in Sect. 6.

2 Classic Stress Tensor Models

2.1 Smagorinsky

The Smagorinsky model is an eddy-diffusivity type of model originally developed for application to atmospheric flows (Moin et al. 1991; Smagorinsky 1963). The stress tensor closure reads,

$$\begin{aligned} \tau _{ij}-\frac{1}{3}\delta _{ij}\tau _{kk}=-2\bar{\rho }\nu _t\left( \tilde{S}_{ij}-\frac{1}{3}\delta _{ij}\tilde{S}_{kk}\right) , \end{aligned}$$
(7)

where the turbulent viscosity \(\nu _t\) is modelled using \(\nu _t=(C_D\Delta )^2|\tilde{S}|\) with \(|\tilde{S}|=\sqrt{2\tilde{S}_{ij}\tilde{S}_{ij}}\). In the original (static) version \(C_D\) is replaced by \(C_S^2\) with \(C_S\simeq 0.2\). It is a very popular model as it is relatively straightforward to implement and computationally efficient. However from a theoretical point of view there are some key issues to highlight. Firstly, it is a purely dissipative model whereas a reverse flow of energy (backscatter) is known to exist from the smaller scales to the larger scales both in 2D flows as shown by Fjortof (1953) and in 3D flows (Domaradzki et al. 1993; Kerr et al. 1996; Piomelli et al. 1991). In addition, the assumption of the unresolved stress tensor being aligned to the resolved rate of strain tensor is a rather strong one as shown by previous experimental and numerical studies (Tao et al. 2000, 2002). Another issue, is that the model predictions are sensitive to the value of \(C_S\) (Smagorinsky constant) which depends on the flow regime (Deardoff 1970; Lilly 1966), but also on the filter width and mesh spacing (Mason and Callen 1986).

These limitations soon became apparent with the static Smagorinsky model performing relatively well for homogeneous and isotropic decaying turbulence but poorly for shear-dominated flows such as turbulent channel flow. In such configurations the value \(C_S\simeq 0.2\) in the near-wall region was found to be excessive and a reduction was required to obtain the correct (lower) dissipation. This led to the development of a dynamic version by Germano et al. (1991) where \(C_D\) was no longer constant but calculated dynamically (during the simulation) from the resolved flow variables. The dynamic Smagorinsky model showed considerable improvement over its static version, particularly in shear flows (Germano et al. 1991), and was later adapted to compressible flows by Moin et al. (1991) whereby \(C_D\) is typically calculated using the least-squares approach (Lilly 1992; Salvetti 1994),

$$\begin{aligned} C_D=\frac{\langle -(L_{ij}-\frac{1}{3}\delta _{ij}L_{kk})M_{ij} \rangle }{\langle 2\Delta ^2M_{ij} M_{ij}\rangle }, \end{aligned}$$
(8)

where \(<>\) indicates a suitable averaging (regularisation) procedure, and \(\hat{}\) indicates test-filtering with a filter \(\hat{\Delta }\). The ratio \(\gamma =\hat{\Delta }/\!\Delta \) is typically taken to equal 2. The Leonard term \(L_{ij}\) is given by,

$$\begin{aligned} L_{ij}=\widehat{\bar{\rho }\tilde{u}_i\tilde{u}_j}-\widehat{(\bar{\rho }\tilde{u}_i)}\widehat{(\bar{\rho }\tilde{u}_j)}/\hat{\bar{\rho }}, \end{aligned}$$
(9)

and

$$\begin{aligned} M_{ij}=\alpha ^2\hat{\bar{\rho }} |\hat{\tilde{S}}|\left( \hat{\tilde{S}}_{ij}-\frac{1}{3}\delta _{ij}\hat{\tilde{S}}_{kk}\right) -\left( \bar{\rho }\widehat{|\tilde{S}|\tilde{S}_{ij}}-\frac{1}{3}\delta _{ij}\bar{\rho }\widehat{|\tilde{S}|\tilde{S}_{kk}}\right) , \end{aligned}$$
(10)

An important point to note is that the Smagorinsky model does not apply for the normal (isotropic) components of the stress tensor. Typically, the static Yoshizawa approximation is used to explicitly model \(\tau _{kk}\) (Yoshizawa 1986) as follows,

$$\begin{aligned} \tau _{kk}=2\bar{\rho }C_I\Delta ^2|\tilde{S}|^2, \end{aligned}$$
(11)

where in the static version the model parameter \(C_I\) is a constant. Yoshizawa suggested a value of \(\simeq \) 0.089 (Yoshizawa 1986), however values ranging from 0.0025–0.009 were reported while dynamically evaluating \(C_I\) in the study of Moin et al. (1991). In the dynamic version, \(C_I\) is calculated using (Moin et al. 1991),

$$\begin{aligned} C_I=\frac{< L_{kk}>}{<P>} \end{aligned}$$
(12)

where \(L_{kk}\) is the trace of the Leonard term, and the term P is given by,

$$\begin{aligned} P=2 \left( \hat{\bar{\rho }}{\hat{\Delta }}^2 {|{\hat{\bar{S}}}|}^2 - {\Delta }^2\widehat{\bar{\rho } {|\tilde{S}|}^2} \right) \end{aligned}$$

From the equations just presented above it becomes apparent that even for a simple model like Smagorinsky the evaluation can be rather complicated: it involves the calculation of tensor variables which include gradients, and filtering as well as test-filtering operations, a process which introduces an additional ad-hoc parameter (test-filter to filter-width ratio) etc. It is also important to note that a regularization procedure for the evaluation of dynamic parameters is almost always required to render them spatially smooth, thus avoiding numerical instabilities. This process is not always unique or justifiable, and typically involves averaging in homogeneous directions (if any), thresholding, smoothing, or otherwise if no homogeneous directions exist. Other more practical issues pertain to the division by near-zero numbers as in the equations for \(C_D\), \(C_I\) and so on.

2.2 Scale Similarity

Consider an incompressible flow in which case the unresolved stress tensor is now simply \(\tau _{ij}=\overline{u_i u_j}-\bar{u}_i \bar{u}_j\). The closure problem reduces to finding a suitable approximation for \(\overline{u_i u_j}\). Consider \(u'_i=u_i-\bar{u}_i\) i.e. the difference between the unfiltered and filtered fields. Then we have upon expansion of the filtered product,

$$\begin{aligned} \overline{u_iu_j}=&\overline{(\bar{u}_i+u'_i)(\bar{u}_j+u'_j)} \nonumber \\ =&\overline{\bar{u}_i \bar{u}_j}+\overline{\bar{u}_i u'_j}+\overline{\bar{u}_j u'_i}+\overline{u'_i u'_j } \nonumber \\ =&\overline{\bar{u}_i \bar{u}_j}+\overline{\bar{u}_i (u_j-\bar{u}_j)}+\overline{\bar{u}_j (u_i-\bar{u}_i)}+\overline{(u_i-\bar{u}_i) (u_j-\bar{u}_j)} \end{aligned}$$
(13)

Up to this point the expansion is perfectly fine however the problem has not disappeared since we are left with further unclosed terms namely the last three terms in the equation above. The main step which follows in scale-similarity models to solve this problem is to assume that (Bardina et al. 1983),

$$\begin{aligned} \overline{\bar{u}_i (u_j-\bar{u}_j)} \simeq \bar{\bar{u}}_i \overline{(u_j-\bar{u}_j)}=\bar{\bar{u}}_i {(\bar{u}_j-\bar{\bar{u}}_j)} \end{aligned}$$
(14)

and that,

$$\begin{aligned} \overline{(u_i-\bar{u}_i) (u_j-\bar{u}_j)} \simeq \overline{(u_i-\bar{u}_i)} \cdot \overline{ (u_j-\bar{u}_j)}=(\bar{u}_i-\bar{\bar{u}}_i)(\bar{u}_j-\bar{\bar{u}}_j) \end{aligned}$$
(15)

i.e. essentially that that filtering operations commute to the individual components of each product. The above assumptions eventually lead to,

$$\begin{aligned} \tau _{ij}=\overline{\bar{u}_i\bar{u}_j}-\bar{\bar{u}}_i\bar{\bar{u}}_j \end{aligned}$$
(16)

which is the scale-similarity model (SIMB) of Bardina for incompressible flows (Bardina et al. 1983). The compressible version derived following analogous arguments reads,

$$\begin{aligned} \tau _{ij}=\bar{\rho }(\overline{\tilde{u}_i\tilde{u}_j}-\overline{\tilde{u}}_i \overline{\tilde{u}}_j), \end{aligned}$$
(17)

Scale-similarity models are able to predict backscatter unlike the static Smagorinsky model however when applied to LES they have long been known to provide insufficient dissipation, clearly a result of the assumptions involving the filtering operations. In an attempt to improve the predictions of the scale-similarity model Andreson and Domaradzki proposed an improved version (Anderson and Domaradzki 2012). Based on the Inter-Scale Energy transfer model of Anderson and Domaradzki (2012) Klein et al. then (2015) suggested a modified version for application to reacting flows (SIMET). This model reads,

$$\begin{aligned} \tau _{ij}=\bar{\rho }\left( \widehat{\hat{\tilde{u}}_i \tilde{u}_j} + \widehat{\hat{\tilde{u}}_j \tilde{u}_i} - \hat{\tilde{u}}_i \hat{\tilde{u}}_j-\widehat{\hat{\tilde{u}}_i \hat{\tilde{u}}_j} \right) , \end{aligned}$$
(18)

In fact, there exist a plethora of scale-similarity models in the literature and a common characteristic of the majority of them is insufficient dissipation. As a result, the most usual application of scale-similarity models is in mixed models. In such models as the name suggests different models are mixed together with the most usual approach being the addition of an eddy-diffusivity type of model (typically Smagorinsky) to a scale-similarity model in order to provide sufficient dissipation.

2.3 Gradient Model

The gradient model (GRAD) can be derived by expanding in Taylor series the filtered velocity product in the expression for \(\tau _{ij}\) (Vreman et al. 1996) and retaining the leading term in the expansion (Clark 1979) leading to,

$$\begin{aligned} \tau _{ij}=\bar{\rho }\frac{\Delta ^2}{12}\frac{\partial \tilde{u}_i}{\partial x_k}\frac{\partial \tilde{u}_j}{\partial x_k}, \end{aligned}$$
(19)

Models of the above kind typically give very good results in a priori studies and provided the filter width is sufficiently small so that the contribution from the terms dropped in the Taylor series expansion is small. However, like the scale-similarity models gradient-type models were also found to provide insufficient dissipation in LES, and as a result they are mainly used in mixed models. An interesting point with the gradient model is that it is essentially a low-order deconvolution-based model (discussed later on).

2.4 Clark Model

Vreman et al. (1996) built upon the mixed model of Clark (1979) to produce the following dynamic mixed model,

$$\begin{aligned} \tau _{ij}=\bar{\rho }\frac{\Delta ^2}{12}\frac{\partial \tilde{u}_i}{\partial x_k}\frac{\partial \tilde{u}_j}{\partial x_k}-C_C\bar{\rho }\Delta ^2|\tilde{S}'|\tilde{S}'_{ij}, \end{aligned}$$
(20)

where

$$\begin{aligned} S^{\prime }_{ij} \left( \widetilde{\textbf{u}} \right) =\frac{\partial \widetilde{u}_i}{\partial x_j} + \frac{\partial \widetilde{u}_j}{\partial x_i} - \frac{2}{3}\delta _{ij}\frac{\partial \widetilde{u}_k}{\partial x_k} = 2\left( \widetilde{S}_{ij}-\frac{1}{3}\delta _{ij}\tilde{S}_{kk}\right) , \end{aligned}$$
(21)

and \(|S'|=(S'_{ij}S'_{ij}/2)^{1/2}\). In the static version \(C_C=0.172\) and in the dynamic version it is calculated using,

$$\begin{aligned} C_C=\frac{\langle M'_{ij}(L_{ij}-H_{ij})\rangle }{\langle M'_{ij}M'_{ij}\rangle }. \end{aligned}$$
(22)

Denoting \(v_i=\widehat{\bar{\rho }\tilde{u}_i}/\hat{\bar{\rho }}\), the tensors \(H_{ij}\) and \(M_{ij}\) are given by

$$\begin{aligned} H_{ij} = \hat{\bar{\rho }}\frac{\hat{\bar{\Delta }}^2}{12}\frac{\partial v_i}{\partial x_k}\frac{\partial v_j}{\partial x_k} - \frac{\Delta ^2}{12}\left( \bar{\rho }\frac{\partial \tilde{u}_i}{\partial x_k}\frac{\partial \tilde{u}_j}{\partial x_k}\right) , \end{aligned}$$
(23)

and

$$\begin{aligned} M'_{ij} = -\hat{\bar{\rho }}\hat{\bar{\Delta }}^2 |S^{\prime }( {v})| S^{\prime }_{ij}( {v})+\Delta ^2\left( \bar{\rho }|S^{\prime }\left( \widetilde{\textbf{u}} \right) | S^{\prime }_{ij}\left( \widetilde{\textbf{u}} \right) \right) , \end{aligned}$$
(24)

The Clark model is a mixed model with the first part consisting of a gradient component and the second consisting of a Smagorinsky-type component to provide the necessary dissipation. This model gave good results for the temporal mixing layer in Vreman et al. (1996, 1997) and was also one of the models selected for testing in Nikolaou et al. (2021) in order to elucidate any difference with the gradient model and to shed light as to whether the eddy-diffusivity part improves the predictions or not.

2.5 Wall-Adapting Local Eddy-Viscosity (WALE)

This model was used to simulate a wall-impinging jet with overall good results in Lodato et al. (2009). It is a mixed model with a Smagorinsky-type component and a scale-similarity component,

$$\begin{aligned} \tau _{ij}-\frac{1}{3}\delta _{ij}\tau _{kk}=-2\bar{\rho }\nu _t\left( \tilde{S}_{ij}-\frac{1}{3}\delta _{ij}\tilde{S}_{kk}\right) +\bar{\rho }(\widehat{\tilde{u}_i\tilde{u}_j}-\hat{\tilde{u}}_i\hat{\tilde{u}}_j), \end{aligned}$$
(25)

The turbulent viscosity is calculated from the velocity gradient and shear rate tensors using,

$$\begin{aligned} \nu _t=(C_W\Delta ^2)\frac{(\tilde{s}^d_{ij}\tilde{s}^d_{ij})^{3/2}}{(\tilde{S}_{ij}\tilde{S}_{ij})^{5/2}+(\tilde{s}^d_{ij}\tilde{s}^d_{ij})^{5/4}}, \end{aligned}$$
(26)

The model constant \(C_W=0.5\), and \(\tilde{s}^d_{ij}\) is the traceless symmetric part of the squared resolved velocity gradient tensor \(\tilde{g}_{ij}=\partial \tilde{u}_i/\partial x_j\),

$$\begin{aligned} \tilde{s}^d_{ij}=\frac{1}{2}(\tilde{g}_{ij}^2+\tilde{g}_{ji}^2)-\frac{1}{3}\delta _{ij}\tilde{g}_{kk}^2, \end{aligned}$$
(27)

where \(\tilde{g}_{ij}^2=g_{ik}g_{kj}\). Note that in this case as well, the static Yoshizawa closure is used to model the trace of the stress tensor as discussed above.

3 Deconvolution-Based Modelling

Deconvolution methods were probably first introduced in fluid mechanics research in the works of Leonard and Clark (Clark 1979; Leonard 1974). Deconvolution aims to invert the filtering operation in LES in order to obtain an approximation of the unfiltered field \(\phi ^*\) from the filtered field \(\bar{\phi }\) which is resolved by the LES. Then, the filtered non-linear functions of \(\phi \) can be approximated using the deconvoluted fields i.e. \(\overline{f(\phi )} \simeq {f(\overline{\phi ^*})}\). In the case of the unresolved stress tensor \(\tau _{ij}\) is a function of the three velocity components therefore the term is closed using \(\tau _{ij} \simeq \bar{\rho } ( \widetilde{u^*_i u^*_j}-\tilde{u}_i \tilde{u}_j)\). Since the deconvolution operation is a purely mathematical operation relating filtered and unfiltered fields such methods do not include any assumptions and/or any modelling parameters/constants. As a result, in principle, they can be used to model a wide range of unresolved terms in the governing equations for different flow configurations including both reacting and non-reacting flows. The deconvolution can be accomplished with (a) Approximate methods, (b) Iterative methods and (c) using machine-learning.

Approximate methods are based on truncated Taylor series expansions of the inverse filtering operation. This approach was used to derive explicit algebraic models for the Reynolds stresses in non-reacting flows (Domaradzki and Saiki 1997; Geurts 1997). In the works of Stolz and Adams (1999) an Approximate Deconvolution Method (ADM) based on a truncated expansion of the inverse filter operation was used, and the deconvoluted signal was then explicitly filtered to obtain closures for the Reynolds stresses. The method was later used by the same authors to model the Reynolds stress terms in wall-bounded flows as well (Stolz and Adams 2001) where classic models such as the static Smagorinsky model are otherwise too dissipative. Approximate deconvolution methods have also been applied to reacting flows (Domingo and Vervisch 2015, 2017; Mathew 2002; Mehl and Fiorina 2017) with overall good results.

Iterative deconvolution methods include the use of reconstruction algorithms such as van Cittert iterations (Nikolaou et al. 2019; Nikolaou and Vervisch 2018; Nikolaou et al. 2018) or otherwise (Wang and Ihme 2017). The classic van Cittert algorithm with a constant coefficient b reads,

$$\begin{aligned} {\phi ^*}^{n+1}={\phi ^*}^{n}+b(\bar{\phi }-G*{{\phi ^*}^n}) \end{aligned}$$
(28)

where \({\phi ^*}^0=\bar{\phi }\), and \({\phi ^*}^n\) is the approximation of the un-filtered field for a given iteration count. In the case \(\phi =\rho u_i\) and \(\phi =\rho \) with \(b=1\) (typical value), the first two iterations result in the following approximations for the unfiltered density and density-velocity product,

$$\begin{aligned}&\rho ^{*0} =\overline{\rho } \\&\rho ^{*1} =2\overline{\rho }-\overline{\overline{\rho }} \\&\lbrace \rho u_i \rbrace ^{*0} =\overline{\rho u_i} \\&\lbrace \rho u_i \rbrace ^{*1} =2\overline{\rho u_i}-\overline{\overline{\rho u_i}} \\ \end{aligned}$$

The \(^n\) approximation of \(\rho u_i u_j\) is calculated using \(\lbrace \rho u_i u_j \rbrace ^{*n}=\lbrace \rho u_i \rbrace ^{*n}\lbrace \rho u_j \rbrace ^{*n}/ \rho ^{*n}\), and the corresponding approximation of the unresolved stress tensor is calculated using \(\tau _{ij}^n=\bar{\rho }(\overline{ \lbrace \rho u_i u_j \rbrace ^{*n} }/ \bar{\rho }-\tilde{u}_i\tilde{u}_j )\). It is straightforward to show that the first two are,

$$\begin{aligned}&\tau _{ij}^{0}=\overline{\bar{\rho }\tilde{u}_i\tilde{u}_j}-\bar{\rho }\tilde{u}_i\tilde{u}_j \\&\tau _{ij}^{1}= \overline{\left( \frac{ 4\overline{\rho u_i} \cdot \overline{\rho u_j}-2\overline{\rho u_i}\cdot \overline{\overline{\rho u_j}}-2\overline{\rho u_j}\cdot \overline{\overline{\rho u_i}}+\overline{\overline{\rho u_i}} \cdot \overline{\overline{\rho u_j}} }{2\bar{\rho }-\bar{\bar{\rho }}} \right) }-\bar{\rho }\tilde{u}_i\tilde{u}_j \\ \end{aligned}$$

Note that for \(n=0\), a Bardina-like scale-similarity model is recovered. For \(n=1\) an extended similarity-like model is obtained which involves double and triple-filtered quantities and so on for higher-order approximations. Successive iterations lead to higher-order approximations of the unfiltered fields and of the unresolved stress tensor as shown by Stolz and Adams (2001). For example, four iterations are sufficient to recover the gradient model supplemented by the next term in the series (Eq. B9 in Stolz and Adams 1999).

It is important to note that deconvolution methods only recover wavenumbers which are resolved by the LES mesh. As a result, deconvolution methods require \(h/ \Delta < 1\) so that wavenumbers below \(\Delta \) can be recovered. As for the van Cittert algorithm it is a linear one, and for periodic signals it is straightforward to show that for a sufficiently large number of iterations, and provided \(0<b<2\), the algorithm is stable and converges to the original value of the un-filtered field for all finite wave-numbers on the mesh (Nikolaou and Vervisch 2018). b is typically taken to equal 1 for non-oscillatory convergence as shown in Nikolaou and Vervisch (2018). The maximum number of iterations required for a sufficiently small reconstruction error, depends on the largest wavenumber resolved by the mesh i.e. on the \(h/ \Delta \) ratio with increasing resolution requiring a larger number of iterations.

4 Machine-Learning Based Models

The theoretical justification for using machine-learning methods and specifically artificial neural networks can be justified by the seminal work of Hornik (1991) where it was proven that a feed-forward neural network, even with a single hidden layer, acts as a universal function approximator (for functions with certain properties), in the limit of a sufficiently large number of nodes. As a result, algebraic closures of increased order of complexity can in principle be developed e.g. for the stress tensor by adjusting the number of layers and/or nodes. Machine-learning methods with regards to modelling the stress-tensor in the context of LES can (thus far) be roughly divided into three distinct categories:

  1. (a)

    Optimization/tuning of existing model parameters and/or their evaluation procedures.

  2. (b)

    Direct modelling of the stress tensor using as inputs variables which are resolved by the LES.

  3. (c)

    Deconvolution-based approaches.

In comparison to non-reacting flows the use of machine-learning for modelling purposes in reacting flows is scarce and has been primarily used to model/accelerate the chemical kinetics (Chatzopoulos and Rigopoulos 2013; Ihme et al. 2009; Sen and Menon 2009; Sen et al. 2010). In terms of modelling, convolutional networks were successfully employed to model the Flame Surface Density (FSD) in Lapeyre et al. (2019) which is an important term in reacting LES (Nikolaou and Swaminathan 2018), and was shown to outperform classic state of the art algebraic models. In Nikolaou et al. (2018, 2019) convolutional networks were used in a deconvolution-based context to model the scalar variance, a key modelling parameter in flamelet methods while Seltz et al. (2019) employed convolutional neural networks to provide a unified modelling framework for both the source and scalar flux terms in the filtered scalar transport equation. With regards to modelling the stress tensor, categories (a)–(c) are discussed in the text which follows.

4.1 Type (a)

Probably the first application of machine-learning in LES with regards to the stress tensor dates to the work of Sarghini et al. (2003) in which a neural network was trained to predict the turbulent viscosity parameter in the Smagorinsky part of a mixed model (Smagorinsky+Bardina). The network was trained by first running LES at \(Ret=180\) with Bardina’s model and the viscosity parameter calculated using the classic dynamic procedure. The data generated from the LES were then used to train the network to essentially replace the more expensive dynamic calculation of the viscosity parameter. The inputs consisted of the nine velocity gradients \(\partial {\bar{u}_i}/\partial {x_j}\) and the six velocity fluctuation products \(u'_i u'_j\). The network was four layers deep, 1(15)-2(12)-3(6)-4(1) with the numbers in parentheses indicating the number of neurons in each layer, and fully connected. The authors reported a 20% speedup in comparison to using the dynamic procedure and that the network performed well for a certain range of Ret close to the training Reynolds number. For larger Reynolds numbers at \(Ret=1050\) a novel training procedure was concluded to be required.

In a more recent study (Xie et al. 2019) a version of the Clark model presented in Sect. 2 was adopted having two tunable parameters instead of one: one for the gradient part and the other for the Smagorinsky part. DNS data of compressible decaying turbulence were then used to train a neural network to predict these two parameters using as inputs the filtered velocity divergence \(\partial {\tilde{u}_i} / \partial {x_i}\), the filtered vorticity magnitude \(|\epsilon _{ijk}\partial {\tilde{u}_i} / \partial {x_j}|\), the filtered velocity gradient magnitude \(\sqrt{\partial {\tilde{u}_i} / \partial {x_j} \partial {\tilde{u}_i} / \partial {x_j}}\) and the filtered strain rate tensor magnitude \(\sqrt{S_{ij}S_{ij}}\). The developed networks showed improved performance over the static/dynamic Smagorinsky and classic Clark models in the a posteriori testing which followed.

4.2 Type (b)

The first direct modelling approach dates to the work of Gamahara and Hattori (2017) where DNS data of turbulent channel flow at \(Ret=180\) were used for training the networks in the usual approach whereby the DNS data are filtered to simulate an LES. A range of possible inputs were tested: (a) \(\lbrace y, S_{ij} \rbrace \), (b) \(\lbrace y, S_{ij}, \Omega _{ij} \rbrace \), (c) \(\lbrace y, \partial {\bar{u}_i}/\partial {x_j} \rbrace \) and (d) \(\lbrace \partial {\bar{u}_i}/\partial {x_j} \rbrace \), where \(\Omega _{ij}=(\partial {\bar{u}_i}/\partial {x_j}-\partial {\bar{u}_j}/\partial {x_i})/2\) is the rotation-rate tensor, and y is the distance from the wall. In total six three-layer fully connected networks were trained i.e. one for each component of the stress tensor. Correlation coefficients were then extracted between the predicted and as-extracted from the DNS components of the stress tensor. For the largest and most dominant streamwise component \(\tau _{11}\), all four sets showed similar correlations in the region of 0.8 with group (c) having the highest. This group was then tested (a-priori) against DNS data of higher Reynolds number at \(Ret=400\) and \(Ret=800\) with overall good results. A posteriori tests at \(Ret=180\) and \(Ret=400\) were also conducted in the same study with overall good results in comparison to the classic Smagorinsky model even though no obvious advantage was reported by the authors.

In the same spirit of Gamahara and Hattori (2017), Wang et al. (2018) used DNS data to train a network to directly predict the stress tensor. The DNS data corresponded to homogeneous decaying turbulence at \(Re_{\lambda }=220\). Five different sets of inputs were tested using four-layer and five-layer networks: (a) \(\bar{u}_i\): 1(3)-2(20)-3(10)-4(1), (b) \(\partial {\bar{u}_i}/ \partial {x_j}\): 1(9)-2(40)-3(20)-4(1), (c) \(\partial ^2 {\bar{u}_i}/ \partial ^2 {x_j}\): 1(9)-2(40)-3(20)-4(1), (d) \(\partial ^2 {\bar{u}_i} / \partial {x_j} \partial {x_k}\) 1(9)-2(40)-3(20)-4(1) and (e) all of the previous inputs: 1(30)-2(90)-3(60)-4(30)-5(1). As in Gamahara and Hattori one network for each component of the stress tensor was developed. Of all the inputs tested groups (b) and (e) produced the highest correlations in a priori testing, with group (e) however only improving marginally the correlations at the expense of having a more complex network. Therefore the importance of using the velocity gradients much like in the study of Gamahara and Hattori was confirmed albeit in a different configuration. Of course this is not surprising since the velocity gradients appear in many models for the stress tensor. Moving on, a further refined network based on group (b) was then developed and tested a posteriori in LES and compared against the static and dynamic Smagorinsky models. The ANN model showed improved agreement in comparison to the two classic models both in predicting the temporal evolution of the kinetic energy and its dissipation rate. In terms of computational cost, the ANN model was found to be 3.6 times slower than the static Smagorinsky model and 1.8 times slower than the dynamic Smagorinsky model, indicating that neural network models need to be as simple as possible to limit computational cost.

Following Wang et al. (2018) in Zhou et al. (2019) a similar procedure was applied to the same configuration i.e. decaying homogeneous turbulence in order to develop a network for the stress tensor. In contrast to the the previous works (Gamahara and Hattori 2017; Wang et al. 2018) a single network was trained for all six components of the stress tensor while additionally taking into account the filter width which along with the nine velocity gradients constituted the input set to the network. The evaluation was performed both a priori against the DNS data and a posteriori with LES with the ANN-based model showing an overall improved performance in comparison to the dynamic Smagorinsky model.

In a more recent study (Park and Choi 2021) the case of turbulent channel flow was revisited. As in the work of Gamahara and Hattori (2017) similar inputs were tested with a four-layer network and six outputs instead. The inputs tested included single-point but also multiple-point variables along the streamwise and spanwise directions. The inputs consisted of (a) \(S_{ij}\)-single point (b) \(\partial {\bar{u}_i}/\partial {x_j}\)-single point, (c) \(S_{ij}\)-multiple points, (d) \(\partial {\bar{u}_i}/\partial {x_j}\)-multiple points and (e) \(\lbrace \bar{u}_i, \partial {\bar{u}_i}/\partial {x_j} \rbrace \)-multiple points. In the a priori tests it was found that the groups (c) and (d) provided the highest correlations and reasonably predicted the backscatter. However, in a posteriori tests it was found that these inputs led to instabilities unless backscatter clipping was used. The single-point group (a) on the other hand showed very good agreement in the a posteriori tests despite the lower correlations observed in the a priori tests.

In reacting flows, an posteriori study using a closely-related data-based approach has been examined in Schoepplein et al. (2018) where Gene-Expression Programming (GEP) was employed. In this approach \(\tau _{ij}\) was assumed to depend on the strain rate and the rotation rate tensors \(S_{ij}\) and \(\Omega _{ij}\) respectively (as in Gamahara and Hattori 2017), but also on the filter width \(\Delta \) and filtered density \(\bar{\rho }\). GEP was then used to derive a best-fit function for the stress-tensor which showed good agreement against the DNS data.

The direct modelling approach for reacting flows was first examined in Nikolaou et al. (2021). A DNS database of a turbulent premixed hydrogen V-flame was used in order to train a network to predict all six components of the stress tensor using as inputs the filtered density \(\bar{\rho }\), and the nine velocity gradients \(\partial {\bar{u_i}}/ \partial {\bar{x}_j}\) (suitably normalised). In comparison to previous studies in the literature this DNS configuration was particularly challenging to model due to the strong inhomogeneity in the direction perpendicular to the mean stream-wise flow, the presence of a bluff body, and the presence of heat release modelled using detailed chemistry—the configuration is shown in Fig. 1. The lowest turbulence cases V60 and V60H (\(Re_T=220\)) were used for training the networks while the highest turbulence level case V90 (\(Re_T=562.8\)) for testing the networks. A 1(10)-2(40)-3(10)-4(18)-5(6) network structure was developed for each filter width considered, able to predict all six components of the stress tensor (Nikolaou et al. 2021). In contrast to previous studies employing fully connected layers in order to account for the strong inhomogeneity in the cross-stream directions it was found necessary to decouple layers 4 and 5 by introducing 3 to 1 connections rather than fully connected between these two layers.

Fig. 1
figure 1

Averaged (in homogeneous y direction) instantaneous progress variable \(c=(T-T_r)/(T_p-T_r)\) for all three cases (\(T_r=\)reactants temperature, \(T_p=\)products temperature). Note that the databases are 3D: cases V60 and V60H used for training and case V97 (highest turbulence level case) used for testing (Nikolaou et al. 2021)

A thorough a priori comparison against all models presented in Sect. 2 was conducted for all three filter widths considered i.e. at \(\Delta /\delta _L=1,2\) and 3 where \(\delta _L\) is the laminar thermal flame thickness. Figures 2 and 3 show the instantaneous predictions (normalised) of all models considered for the largest filter width for the dominant components \(\tau _{11}\) and \(\tau _{13}\) respectively. These results are quantified in terms of the Pearson correlation coefficient for each individual component of the stress tensor averaged over all filter widths in Fig. 4. The results show that the networks are able to outperform the predictions obtained using the classic models while the work in Nikolaou et al. (2021) also confirmed the results found in Klein et al. (2015) on the poor performance of the Smagorinsky model (static and dynamic) for reacting flows.

Fig. 2
figure 2

Scatter plots of instantaneous values of DNS and modelled \(\tau _{11}\) on the LES mesh, for \(\Delta ^+=3\) (Nikolaou et al. 2021)

Fig. 3
figure 3

Scatter plots of instantaneous values of DNS and modelled \(\tau _{13}\) on the LES mesh, for \(\Delta ^+=3\) (Nikolaou et al. 2021)

Another important point to consider in the model evaluation step is the ability of a model to predict the correct relative magnitude between the different stress components which amounts to evaluating the alignment angle between the DNS and modelled resultant stress in a given direction. A perfect model would correspond to a zero alignment angle between the modelled and DNS stresses in a particular direction and the probability density function would approach a \(\delta \) function at zero. This evaluation step is particularly important to do in flows with strong inhomogeneities since in such cases one must ensure that the model’s predictions are not biased towards any of the dominant or non-dominant components of the stress tensor. Therefore, in a further evaluation step in Nikolaou et al. (2021) probability density functions of the alignment angle between the modelled and DNS stress tensor \(\tau _{j1}\) were extracted and compared for each model. The results are shown in Fig. 5 where it is apparent that the ANN-based model shows an improved performance in comparison to the classical models.

Fig. 4
figure 4

Pearson correlation coefficients averaged across all filter widths, for each stress tensor component for case V97 (Nikolaou et al. 2021)

Fig. 5
figure 5

Probability density function of the angle \(\theta \) between DNS \(\tau _{j1}\) and modelled \(\tau ^m _{j1}\) for \(\Delta ^+ =3\) (Nikolaou et al. 2021)

4.3 Type (c)

The first use of machine-learning in a deconvolution-based context dates to the work of Maulik and San (2017) where a single-layer network with 100 neurons was trained to recover estimates of the unfiltered velocity components \(u^{*}_i\) from their filtered counterparts \(\bar{u}_i\). The inputs to the network consisted of the filtered velocity components in the neighbourhood of a given point. This enabled the direct modelling of the stress tensor using explicit filtering on the deconvoluted variables. The developed networks were tested a priori for different cases including 2D Kraichman, 3D Kolmogorov and compressible stratified turbulence with overall good results.

In the same spirit, a neural network was trained in Yuan et al. (2020) to reconstruct the unfiltered velocity components which was tested both against the DNS data and a posteriori in LES of forced isotropic turbulence. The inputs consisted of the filtered velocities in the region surrounding a given point as in Maulik and San (2017) and the outputs consisted of the three unfiltered velocity components which were then filtered explicitly to model the stress tensor as in classical deconvolution-based approaches. In a posteriori testing, the ANN-based models provided improved predictions over the dynamic Smagorinsky model.

5 A Note: Sub-grid Versus Sub-filter

It is important to note that the terms “sub-grid” and “sub-filter” are different. “Sub-grid” refers to scales not resolved by the mesh h while “sub-filter” refers to scales not resolved by the filter width \(\Delta \). In the majority of classic approaches \(h/ \Delta =1\) and the terms are equivalent however in approaches which include deconvolution/machine-learning \(h / \Delta < 1\) in which case the terms are not equivalent: in such cases “sub-filter” refers to scales between h and \(\Delta \) which are resolved by the mesh and can potentially be recovered e.g. using deconvolution and/or suitably trained neural/convolutional networks.

6 Challenges of Data-Based Models

6.1 Universality

As the name suggests, data-based methods depend on data. One can view machine-learning methods such as ANNs and CNNs as a multi-dimensional data-fitting procedure. As a result, the predictive ability of a network depends on the dataset. For datasets not too dissimilar to the dataset used to train a network in the first place, the predictions are expected to be reasonably good since in such cases inference is equivalent to a form of high-dimensional interpolation. For datasets which are too dissimilar (which lie far from the multi-dimensional fitted surface) the predictions are expected to be poorer in comparison since in such cases inference is equivalent to extrapolation. For instance, a neural network trained solely on homogeneous decaying turbulence data to predict the stress tensor would probably perform poorly in shear-dominated flows and vice versa. Increasing the training data-size is always an option however this would lead to even more complex networks with increased computational cost. Another option would be to train case-specific networks and switch between them depending on the local flow configuration. In general, the universality of a network depends on the size, quality, and diversity of the databases used for training.

6.2 Choice and Pre-processing of Data

Any inputs to a data-driven model need to be appropriately scaled, and standardization is a commonly used procedure for this purpose. Usually in the turbulence modelling community, such standardization is performed on the input variables which are already appropriately normalized by using some physical quantities such as mean flow velocity and turbulence length scale. However, it is often the case that such reference quantities are not available or they do not necessarily represent flow phenomena in practical problems. For example, non-reacting flow DNS is often performed for non-dimensional quantities. One way to train a model is to use such non-dimensional quantities as they are with or without standardization. While such a strategy would not require normalization based on physical quantities for training, applying a model based on this strategy to practical LES problems, one would face an issue of finding appropriate parameters to non-dimensionalize the quantities.

6.3 Training, Validation, Testing

Developing a model based on machine-learning typically involves three steps namely training, validation, and testing. The validation step is typically performed during the training phase on a subset of the training data while the chosen testing dataset varies from study to study. In some studies for instance the testing dataset is also a subset of the training dataset albeit at different spatio-temporal coordinates within the computational domain. This approach is convenient as there is no need to perform additional and often expensive simulations to generate new data e.g. at a higher Re or Ma number. However this approach may introduce a bias in the predictive ability of the network since the testing dataset may be too similar to the training/validation datasets. Therefore careful thought is required on the most appropriate training and testing strategy.

6.4 Network Structure

The choice of network structure is typically performed on a trial and error basis and to date there is no formal/theoretical procedure to a priori obtain the best network structure (number of layers, number of nodes, type of activation function, type of loss function) for a given set of inputs and outputs which minimises the training error. In addition, increasing the number of layers and/or nodes does not always improve the predictive ability of the network. Furthermore, and perhaps there is no formal way of a priori choosing the best set of input variables for a given output set and for a given network structure-typically a range of inputs are tested based on intuition.

When it comes to practical LES, some networks are more difficult to implement and parallelise in LES solvers than others. For instance, point-wise inputs are very convenient for LES applications while inputs requiring the values of the surrounding mesh points are tricky to implement and parallelise in practice using MPI. This is often the case with CNNs and other types of networks utilizing plane and volumetric inputs on Cartesian mesh points. However most LES codes often employ non-uniform and unstructured meshes. Of course, the fields can be interpolated to generate CNN-like inputs at every iteration at every point, but this would result in increased computational cost and other associated issues (Kashefi et al. 2021). One potential strategy to circumvent this issue while keeping the important spatial information for the inputs is so-called “point-cloud deep learning” (Kashefi et al. 2021). Although this framework is not yet well established for modelling the stress tensor, the compatibility to arbitrary mesh geometry is something future machine-learning models should consider.

6.5 LES Mesh Size

The development of LES models using DNS data involves explicit filtering operations with a filter size \(\Delta \). An important question is then how does one choose h i.e. the LES mesh size? Typically in classic approaches \(h/ \Delta =1\) but this choice does not ensure that the resolved fields such as velocity and scalar fields are well-resolved. Consequently, the gradients of these variables as obtained on the LES mesh which are typically used as inputs to neural networks are also not well-resolved which introduces a bias in the predictive ability of the network-this is also the case when evaluating the performance of classic models which involve gradient terms.

In an effort to resolve this Nikolaou and Vervisch (2018) proposed a criterion for the LES mesh size, based on a scalar variation evolving from 0 to 1, which was originally proposed for a “reaction progress variable” (e.g. non-dimensional temperature) but which can also be regarded as a normalized fluctuating velocity component \(\phi (x)\) \((0\le \phi \le 1)\).

$$\begin{aligned} \phi (x)=\frac{1}{2}\left( 1+erf\left( \frac{x\sqrt{\pi }}{\delta }\right) \right) , \end{aligned}$$
(29)

where \(\delta \) is a length scale for the gradient defined as \(\delta =1/\max (d\phi /dx)\). Filtering Eq. (29) based on the filtering operation (Eq. (2)) with a Gaussian kernel, the filtered field \(\bar{\phi }(x)\) can be obtained as,

$$\begin{aligned} \bar{\phi }(x) = \frac{1}{2}\left( 1+erf \left( \frac{1}{\sqrt{1+\frac{\pi }{6}\frac{\Delta ^2}{\delta ^2}}} \frac{x\sqrt{\pi }}{\delta } \right) \right) . \end{aligned}$$
(30)

The length scale for the gradient of the filtered field can be obtained in the same manner as \(\delta =1/\max (d\bar{\phi }/dx)\), which leads to

$$\begin{aligned} \bar{\delta } = \delta \left( 1+\frac{\pi }{6}\frac{\Delta ^2}{\delta ^2} \right) ^{1/2}, \end{aligned}$$
(31)

ensuring \(\bar{\delta }/\delta >1\) i.e. that the length scale increases due to the filtering operation. It would be more useful to rewrite Eq. (31) in terms of \(\bar{\delta }/\Delta \), since our interest here is how fine the mesh should be to capture the gradient information of the filtered field with \(\Delta \),

$$\begin{aligned} \frac{\bar{\delta }}{\Delta }=\left( \frac{\pi }{6} + \frac{\delta ^2}{\Delta ^2} \right) ^{1/2}. \end{aligned}$$
(32)

Usually, to resolve a filtered gradient n mesh points are required within \(\bar{\delta }\) which results to,

$$\begin{aligned} \frac{h}{\Delta }=\frac{1}{n}\left( \frac{\pi }{6}+\frac{\delta ^2}{\Delta ^2}\right) ^{1/2}. \end{aligned}$$
(33)

In most turbulent flows, it is expected that \(\delta /\Delta \sim 0\). Equation (33) yields \(h/\Delta \simeq 0.36\) for \(n=2\) (two mesh points within the filtered slope), and \(h/\Delta \simeq 0.18\) for \(n=4\), leading to the insight that the LES mesh required to capture the filtered gradient should have two to five mesh points within \(\Delta \). This consideration is required when generating filtered quantities from resolved fields such as DNS, especially for machine-learning with gradient-related inputs, but is also useful for conventional gradient model assessments.

6.6 Performance Metrics

The quantification of prediction accuracy is very important since in the modelling of the stress tensor a model assessment needs to be performed spatio-temporally and for all six components of the stress tensor—a comprehensive visual examination is just not enough. Amongst the possible quantification methods, the mean squared error (MSE) would be the most convenient to use since it is already incorporated in the loss function of most machine-learning algorithms. Another choice is the root mean squared error (RMSE). However, MSE and RMSE are considered to be sensitive to local outliers which are prevalent in non-linear phenomena. For this reason, the mean absolute error (MAE) may be more suitable for model assessment purposes.

Fig. 6
figure 6

Scatter plots of target values \(y_i\) and predicted values \(\hat{y}_i\). af: scenarios (a) to (f), respectively

In various model developments in the turbulent flow community, the cross-correlation coefficient is also used extensively. While this quantity is familiar to the community, relying on this coefficient alone can bias the model performance assessment significantly. This point is illustrated by using the following simulated target values \(y_i\) and predicted values \(\hat{y}_i\) in scenarios (a) to (f), where i is the index of N-samples.

  1. (a)

    Predicted values are scattered around the target values.

  2. (b)

    Predicted values are scattered around the target values, but 15% of samples have much larger deviation (outliers).

  3. (c)

    Predicted values are scattered around the target values, but 30% of samples have much larger deviation (outliers).

  4. (d)

    Predicted values are scattered around a line \(\hat{y}_i=0.5y_i+0.25\). The deviation from the line is the same as (a).

  5. (e)

    Predicted values are scattered around a line \(\hat{y}_i=y_i+0.15\). The deviation from the line is the same as (a).

  6. (f)

    Predicted values are scattered around a line \(\hat{y}_i=y_i+0.30\). The deviation from the line is the same as (a).

Scenario (a) represents perhaps a good model. In turbulent flow problems where the variables take a wide range of values however, such a good model may output a prediction with a large deviation for a limited number of samples, and such situations may correspond to scenarios (b) and (c). The situations where the trend of predicted values is close to the target values but there is some deviation between the two may correspond to scenarios (d), (e) and (f). Examples of such scenarios are shown in Fig. 6.

For the scenarios (a)–(f), the following metrics often used for model assessments are considered,

  • Mean absolute error

    $$\begin{aligned} \epsilon _\textrm{MAE} = \frac{\sum _{i=1}^N \left| y_i-\hat{y}_i \right| }{N}. \end{aligned}$$
    (34)
  • Relative mean absolute error

    $$\begin{aligned} \epsilon _\textrm{rMAE} = \frac{\epsilon _\textrm{MAE}}{\bar{y}}. \end{aligned}$$
    (35)
  • Mean squared error

    $$\begin{aligned} \epsilon _\textrm{MSE} = \frac{\sum _{i=1}^N \left( y_i-\hat{y}_i \right) ^2}{N}. \end{aligned}$$
    (36)
  • Root mean squared error

    $$\begin{aligned} \epsilon _\textrm{RMSE} = \sqrt{\epsilon _\textrm{MSE}}. \end{aligned}$$
    (37)
  • Relative root mean squared error

    $$\begin{aligned} \epsilon _\textrm{rRMSE} = \frac{\epsilon _\textrm{RMSE}}{\bar{y}}. \end{aligned}$$
    (38)
  • Pearson’s cross-correlation coefficient

    $$\begin{aligned} \rho _p = \frac{\sum _{i=1}^N \left( y_i - \bar{y} \right) \left( \hat{y}_i - \bar{\hat{y}} \right) }{ \sqrt{\sum _{i=1}^N \left( y_i - \bar{y} \right) ^2 \left( \hat{y}_i - \bar{\hat{y}} \right) ^2} } \end{aligned}$$
    (39)
  • Coefficient of determination

    $$\begin{aligned} R^2 = 1 - \frac{ \sum _{i=1}^{N} \left( y_i-\hat{y}_i\right) ^2 }{ \sum _{i=1}^{N} \left( y_i-\bar{y} \right) ^2 } \end{aligned}$$
    (40)
  • Coefficient of  Legates and McCabe (2013)

    $$\begin{aligned} E_1 = 1 - \frac{ \sum _{i=1}^{N} \left| y_i-\hat{y}_i\right| }{ \sum _{i=1}^{N} \left| y_i-\bar{y} \right| } \end{aligned}$$
    (41)

In the list above \(\bar{\cdot }\) denotes the mean value. The metrics \(\rho _p\), \(R^2\) and \(E_1\), yield 1 for a perfect model. All of the above metrics are computed and summarised in Table 1 for scenarios (a)–(f). Note that \(\rho _p^2\) is also shown since it is often used as an alternative definition for the coefficient of determination. As clearly seen, the cross-correlation coefficient \(\rho _p\) shows relatively high values for all the scenarios except for (c) where \(\rho _p=0.64\), which may still be acceptable for certain purposes. However, there is substantial discrepancy between the intuitive interpretation of Fig. 6 and \(\rho _p\) in Table 1 for scenarios (d)–(f). For these cases the relative errors \(\epsilon _\textrm{rMAE}\) and \(\epsilon _\textrm{rRMSE}\), vary from 25% to 63%, while \(\rho _p=0.98\) for these scenarios. Also, \(\epsilon _\textrm{rRMSE}\) and \(R^2\) tends to be more sensitive to large deviation of small number of samples respectively than \(\epsilon _\textrm{rMAE}\) and \(E_1\) (see the scenario (b)), and this is considered due to \((y_i-\hat{y}_i)^2\). These considerations suggest that model assessments based on \(\rho _p\) alone cannot thoroughly assess a model’s performance accurately, and \(\rho _p\) should be used along with visual examination and/or another metric.

Table 1 Scatter plots of generic target values \(y_i\) and predicted values \(\hat{y}_i\)

7 Summary

Machine-learning methods are increasingly being used by the fluid mechanics community for modelling purposes and in particular for the unresolved stress tensor. The applications are diverse while a large number of both a priori but also a posteriori assessments have shown data-based methods either to outperform the predictions of classic models or to at least match them. The developed networks are typically one to five layers deep with around one hundred neurons in each hidden layer with the structure of the networks varying from study to study. Overall, the best-performing inputs appear to be gradients of the filtered velocity components and functions of the velocity gradients such as the strain rate tensor and the rotation-rate tensor irrespective of the nature of the flow i.e. reacting or non-reacting. In terms of computational cost this depends on the structure of the networks with most of the developed networks in the literature, despite being slower than the classical algebraic models, exhibiting around the same order of magnitude cost. Despite however the success of the developed networks some important issues still remain which are discussed in the text. The most important in the authors view is universality. The predictive ability and versatility of a network is tightly coupled to the dataset used for training in the first place. At the time being, in the majority of studies in the literature these databases are restricted to small-scale DNS of often canonical flow problems such as decaying homogeneous turbulence, turbulent channel flow, statistically planar freely-propagating flames etc. while in practical LES the flows are significantly more complex but also at higher Re and Ma numbers. In order to overcome this issue and to eventually obtain a truly case-independent and parameter-free machine-learning-based model for the stress tensor, further research is required at conditions which are more relevant for practical flows including both a priori and a posteriori studies.