Under-resolved and large eddy simulations of a decaying Taylor–Green vortex with the cumulant lattice Boltzmann method

We present a comprehensive analysis of the cumulant lattice Boltzmann model with the three-dimensional Taylor–Green vortex benchmark at Reynolds number 1600. The cumulant model is investigated in several different variants, using regularization, fourth-order convergent diffusion and fourth-order convergent advection with and without limiters. In addition, a cumulant model combined with a WALE sub-grid scale model is being evaluated. The turbulence model is found to filter out the high wave number contributions from the energy spectrum and the enstrophy, while the non-filtered cumulant methods show good correspondence to spectral simulations even for the high wave numbers. The application of the WALE turbulence model appears to be counter productive for the Taylor–Green vortex at a Reynolds number of 1600. At much higher Reynolds numbers (Re=160,000\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hbox {Re}}=160{,}000$$\end{document}) a deviation from the ideal Kolmogorov theory can be observed in the absence of an explicit turbulence model. Cumulant models with fourth-order convergent diffusion show much better results than single relaxation time methods.


Introduction
The purpose of this paper is to investigate and highlight the effect of various parameter choices and modelling aspects on the performance of a cumulant lattice Boltzmann model based on the decay of a three-dimensional Taylor-Green vortex as an example for a canonical transitional flow. Recent years have seen the upcoming of many new lattice Boltzmann models based on various modelling rationals. The promise behind the various models is to provide more efficient, more accurate or more stable results than previous lattice Boltzmann models. While derivation details between the many variants differ, the novelty as compared to the classical single and multiple relaxation time models is found in one or several of the following categories: • Advanced discretizations of the momentum space • Advance equilibrium functions • Advanced bases transformations for collision • Adaptive relaxation rates • Optimized relaxation rates • Non-local collision terms From this list we explicitly exclude the wide field of non-lattice conforming discretizations such as finite difference lattice Boltzmann [1], discontinuous Galerkin lattice Boltzmann [2] and semi-Lagrangian lattice Boltzmann [3]. While all these approaches are interesting and potentially superior to the classical stream-onlattice and collide-on-nodes approach, they have diverged so considerable from the original lattice Boltzmann model that they qualify as independent models to be discussed in a different context.
Advanced discretization of momentum space usually involves the utilization of many discrete speeds [4][5][6][7], or a lattice structure other than the Cartesian simple cubic lattice (e.g. body-centred cubic lattices [8,9]). From a finite difference perspective it is most natural to try to improve a method by enlarging or rearranging its stencil. However, in the lattice Boltzmann method progress in this direction has always been very slow and unsatisfactory. Substantial advances would require velocity sets with non-rational ratios between the velocities reminiscent of Gauss integration points [10]. The success of mono-speed lattice Boltzmann models can partly be attributed to the simple fact that a single velocity can always be scaled onto a conforming integer lattice. Therefor, three-point symmetric Gauss integration is possible on an Cartesian lattice, but this cannot be generalized to more than three points per direction.
Advanced equilibrium functions are improvements over the classical Taylor expanded Maxwellian used in classical single or multiple relaxation time methods [11,12]. The simplest approach is to increase the order of the Taylor expansion from two to three or higher. Some entropic lattice Boltzmann models use non-polynomial equilibrium functions [13].
Advanced bases transformations are used in various new multiple relaxation time methods. The idea is always that the collision operator is not applied to the distributions directly but to some sort of moments such that different relaxation parameters can be attached to different observable quantities. While the original multiple relaxation time models used unweighted orthogonal raw moments [14][15][16], a wide range of other options have now been promoted, including central moments [17][18][19], Hermite moments [20] and cumulants [21]. Besides these complete transformations there exist methods separating hydrodynamic relevant information from so-called ghost modes using only a small set of categories (two or three). Hydrodynamic modes are than relaxed as in the single relaxation time method, while the ghost modes are treated separately, for example by regularization [22,23] or by adaptive relaxation [24].
Beside several adopted standard turbulence models, the most prominent methods applying adaptive relaxation times are the entropic models. The original entropic models were single relaxation time models with the relaxation time chosen to fulfil a discrete H-theorem replacing physical entropy by information entropy [13,25]. More recently entropic models using two relaxation rates were promoted. They use a fixed relaxation rate for the hydrodynamic moments and a modulated one for the ghost moments [24]. Applying limiters to the relaxation of some high-order cumulants technically also falls into the category of adaptive relaxation rates [26].
Optimization of relaxation rates cannot be discussed separated from the type of basis transformation applied. The basic idea is always that the governing equation constrains only some of the available relaxation rates for the various non-conserved observable quantities. The remaining rates can then be chosen to fulfil other objectives. The concept of magic relations between hydrodynamic and non-hydrodynamic relaxation times in lattice Boltzmann originated in the 1990s [27]. When distinguishing only between the relaxation of odd and even moments, it had been observed that the error in the lattice Boltzmann method was a function of a certain combination of the two relaxation rates such that the error could be kept constant with respect to varying viscosities by keeping the particular function constant [28]. However, this by itself did not mean that the error was small. Xu and coworkers proposed a sensitivity analysis for the free relaxation parameters in the MRT model and optimized them with regard to minimal dispersion and dissipation errors [29]. Modena and co-workers used a different objective and optimized relaxation rates according to a k − 1% rule with the aim to increase dissipation at high wave numbers in order to improve stability [30]. The current authors used the objective to eliminate the leading-order error in diffusion in the asymptotic expansion of a cumulant lattice Boltzmann model and found a general analytical solution [26].
Non-local collision terms refer to all methods that invoke correction terms based on finite differences. A recent example is the hybrid recursive regularized lattice Boltzmann [31]. Finite differences can also be used to correct for the phase error in lattice Boltzmann [32].
The cumulant method aims at incorporating most of the progress discussed above with the exception of large stencils. Among the lattice Boltzmann models on standard (monospeed) lattices, the cumulant method is the only one for which fourth-order convergence of momentum diffusion and advection could be attained to date [26,32]. The cumulant transform implicitly incorporates an extended equilibrium and a recursive regularization by design. Unlike regularized methods it retains the free relaxation rates for higher-order cumulants which can then be used for elimination of the leading-order error terms [26] in momentum diffusion. The elimination of the leading error in advection requires a correction term based on finite differences [32]. The determination of both the diffusion and the advection correction used asymptotic expansion techniques in diffusive scaling. The diffusive scaling assumes a limit where the time step scales with the grid spacing squared. The diffusive scaling is the only scaling for which convergence of the lattice Boltzmann method to the Navier-Stokes equation can be formally shown without any assumptions on the Mach number [33]. This is not only true for the incompressible Navier-Stokes equation, but it is intrinsic to the fact that a static velocity set is used in connection with explicit time marching algorithm that allows information only to travel a finite distance per time step. Despite this theoretical obstacle it has become quite popular to apply the lattice Boltzmann method to aero-acoustic problems and to dismiss analysis based on diffusive scaling accordingly [20]. It is therefor of considerable practical interest to investigate the fourth-order convergent cumulant method in both diffusive and acoustic scaling, which is one of the main aims of the current study. Further, in order to highlight the respective contributions of the various measures taken in the development of the cumulant method we will apply the method using various configurations. Since our aim is to investigate the influence of various modelling options in different asymptotic scalings, we have to conduct a large number of simulations even for a single test case. In this paper we will therefor limit ourselves to the three-dimensional Taylor-Green vortex as a generic test for isotropic transitional flow. By limiting ourselves to this particular flow we are able to provide a dense sampling of various modelling parameters.

Differences between the cumulant model and other lattice Boltzmann variants
The standard lattice Boltzmann model approximates the kinetic momentum distribution function f through a small set of discrete distributions f i jk with finite velocities (i, j, k) T x/ t and i, j, k ∈ Z arranged on a regular lattice with grid spacing x such that each particle moves from one lattice node to another lattice node in a fixed time step t. This arrangement requires all lattice nodes to be located on a regular grid. The movement of the discrete distributions is modified with the application of the collision operator i jk . The restrictions of the lattice can be relaxed or mitigated by various interpolation schemes, more sophisticated discretizations [1][2][3] or by allowing a large number of discrete velocities [6,12]. These approaches, albeit interesting, are not subject of the current paper. Instead we will focus on small regular lattices, where i, j, k ∈ {−1, 0, 1}. While for these models the choice of the velocity set is rather restrictive, there is a considerable freedom in the design of the collision operator and a large number of proposals has been put forward in the last three decades. The very first lattice Boltzmann models were based on matrix collision operators [34]. They were quickly replaced by the BGK collision operator [35] which relaxes the distribution directly towards an equilibrium distribution using a single relaxation time. This approach is motivated by the classical BGK collision operator used in kinetic theory [36].
The current study focuses on the cumulant method [21,26,[37][38][39][40][41][42][43][44][45]. In the cumulant method we transform the distribution into a set of statistically independent observable quantities. This is mathematically expressed by demanding that the joint probability distribution of all observable quantities is the product of their individual distribution functions. This requirement is met by the cumulant generating function being the logarithm of the moment generating function. Cumulants appear to be a very abstract mathematical construction. However, applied to the momentum distribution functions they turn out to be very intuitive quantities of which humans have been aware for a very long time. It is quite striking that the leading-order cumulants just turn out to be velocity and temperature without even requiring normalization to unit density. In fact, the most compact and concise definition of temperature is saying that it is the trace of the second-order cumulant tensor of the momentum distribution function.
In addition to the basis transformation, recent research has focused on the role of the equilibrium distribution [46,47]. The cumulant method intrinsically uses the same number of monomials in the equilibrium function as the number of discrete velocities. It has been assumed for a long time that this maximizes stability and accuracy [48,49].
Among the recent new lattice Boltzmann models the cumulant model stands out in a few aspects. These include: • A parametrization resulting in fourth-order spatial accuracy of the viscous term has been identified [26].
• Fourth-order spatial accuracy of the advection term can be obtained with a small correction involving finite differences [32].
• Even without this correction, the viscosity is Galilean invariant, i.e. there are no cubic error terms in the viscosity known to plague almost all competing lattice Boltzmann models [21]. • The cumulant method does not coincide with the BGK model if all relaxation rates are identical [21,26].
The fact that the cumulant method is not identical to the BGK method if used in single relaxation time (SRT) mode is surprising as this is the case for all other multiple relaxation time bases we are aware of. In the lattice Boltzmann literature BGK and SRT are usually used synonymous. Here we briefly discuss the reason why this is not the case for a cumulant basis.
The BGK collision operator ∂ c t f for the continuous Boltzmann equation is stated as: where f is the momentum distribution, f eq is the equilibrium distribution, ξ is the microscopic particle velocity vector, and τ is the single relaxation time being the only adjustable parameter of this model. In the discrete form used later for the lattice Boltzmann equation the relaxation time has to be corrected [50] which has been interpreted in different ways, either as an integration along characteristics [51] or as a consequence of Strang splitting [52]. Instead of writing Eq. (1) in terms of the distribution function f we can write it in terms of the Fourier transformed distribution function F = F{ f }: Since the Fourier transform and Eq. (1) are linear, it is easy to see that the equation does not change its form. The Fourier transform of a distribution function is also called its moment-generating function. Various types of moments are obtained by taking derivatives of the moment-generating function in various combinations. In that way, the moment-generating function gives rise to the various different bases used for multiple relaxation time lattice Boltzmann models, ranging from raw moments over central moments to Hermite moments including all different sorts of orthogonalization and regrouping. Due to the linearity of Eq. (1) and due to the fact that all moments are obtained by linear combinations of derivatives of the moment-generating function, it is easy to see that if all moments are relaxed with same relaxation time, Eq. (2) and consequently also Eq. (1) are recovered. It is hence obvious that all moment methods recover the same BGK operator if they assign the same relaxation time to each moment and if they have the same equilibrium function. Cumulants behave differently. The cumulant generating function is the logarithm of the moment-generating function. Applying a single relaxation time to the cumulant generating function leads to: Solving for ∂ c t F in Eq. (3) and putting it into the Boltzmann equation gives: We will call Eq. (4) the cumulant SRT equation or KSRT (where we use 'K' for cumulants [20,53] due to the fact that 'C' has been established for central moments in the lattice Boltzmann context and because cumulant is written with 'K' in many European languages). Further, we denote the BGK operator with the same equilibrium as the cumulant operator as BGK+ in order to distinguish it from the standard lattice BGK model which uses a Taylor expanded Maxwellian as equilibrium. The KSRT equation is no longer equivalent to the BGK or BGK+ equation even in the continuum. It has, however, never been investigated what consequences these differences have. As we argued before [26], the differences in the asymptotic expansion between the BGK+ and the KSRT method are extremely small. In diffusive scaling differences occur only beyond the order of the leading error such that the accuracy is not expected to be strongly affected. Yet an influence on stability is possible.
Very recently, Coreixas and co-workers published their analysis of the KSRT method in two dimensions and found little difference to other SRT collision models with regard to nonlinear stability [20]. Quite interestingly, the KSRT method had a smaller range of stability than raw and central moment methods when studied theoretically with a von-Neumann analysis. However, this behaviour was no longer observed in actual simulations where the effective stability range appeared to be larger than implied by the linearized von-Neumann analysis. It is of course of note here that the primary difference between cumulants and moments is their nonlinear behaviour. Thus, it is not surprising that von-Neumann analysis gives an incomplete picture for the performance of the cumulant method. We also note that pronounced differences between moments and cumulants appear in three dimensions as we have repeatedly argued [21,26]. Differences between cumulants and central moments are so small in two dimensions that we have abstained from presenting a two-dimensional cumulant method with the exception of a model used for solving the shallow water equation [54]. Finally we note that the cumulant method analysed by Coreixa et al. [20] differs from the cumulant method proposed by us due to their omission of the Galilean invariance correction. The cumulant lattice Boltzmann method unfolds its full potential only if combined with the parametrization for fourth-order accuracy. Multiple relaxation time models are frequently criticized for having unknown parameters which the user cannot easily deduce from the physics of the problem. In the case of the cumulant method optimal relaxation rates for third-order cumulants have been found [26]. It is correct that this leaves relaxation rates for cumulants of order four and higher undetermined. However, the influence of these higher-order cumulants is small compared to the third-order cumulants at least in the asymptotic expansion. The reason why it is difficult to predict the influence of higher-order relaxation rates is exactly because this influence is small.

Galilean invariance correction
Most lattice Boltzmann models using a D3Q27 velocity set or one of the other standard lattices suffer from a velocity-dependent viscosity [55]. This problem is common to all models which do not recover the full velocity moment tensor of third order and are equipped with an at least second-order expansion of the equilibrium function. This includes the standard BGK method [35], classical MRT models [16], regularized lattice Boltzmann models [22], recursively regularized lattice Boltzmann models [23], the classical entropic lattice Boltzmann model [25], the KBC model [24] and classical central moment models [17,18]. The problem of velocitydependent viscosity is not present in models with increased isotropy such as those using a body-centred cubic lattice [8]. Using the standard D3Q27 lattice, the problem can be removed by a modification of the relaxation rate [47] or the equilibrium moments [21]. This correction is usually applied in the cumulant method and has also been included into some central moment methods [19]. However, in their recent work Coreixas and co-workers analysed cumulant methods omitting the correction [20]. Consequently, their conclusion that only the equilibrium function affects the accuracy of the method does not apply to our cumulant method or to other methods incorporating the correction. In [21] we showed how the same correction can be incorporated into the BGK and the MRT method. Incorporating the same correction into entropic and regularized schemes would be trivial but has not been done to the best of our knowledge. Despite the Galilean invariance of the viscosity in the cumulant scheme, the advection operator is still discretized with only second-order accuracy. This is evident from the phase lack of a planar vortex in a superimposed constant velocity field (e.g. Fig. 6 in [21]). The reason for this error is in the aliasing of fourth-order cumulants due to the finite number of discrete velocities [32]. This error is small when compared to other defects in classical lattice Boltzmann models. However, compared to the fourth-order accurate diffusion term of the cumulant method with optimal relaxation rates it becomes non-negligible. An effective way to overcome this error is to add a correction term to the equilibrium of second-order cumulants [32]. This correction term requires a second derivative of the velocity. It would be very convenient to obtain this second derivative from the non-equilibrium odd-order cumulants (third and fifth order) which is theoretical possible but apparently unfeasible due to the poor conditioning of these cumulants. An obvious but more costly solution to obtain the missing derivatives is the application of finite differences. With this extension, fourth-order convergence of the advection can be obtained even in combination with fourth-order convergent diffusion. It has, however, been observed that the advection correction reduces stability [26]. In the current paper we use a slightly modified version of the correction which we found to be more robust.
The leading error in Galilean invariance (i.e. advection) in the lattice Boltzmann method on the D3Q27 lattice arises from the aliasing of the fourth-order cumulants c 400 , c 040 and c 004 which is found to be [32]: Here = x = t 1/2 is the diffusive scaling parameter, θ = 1/3 is the dimensionless temperature, and c 100 = u is the velocity in x-direction. The cumulant c 400 is seen to have a spurious dependency on the velocity which breaks Galilean invariance. This deviation from true Galilean invariance is rather small, and it is ignored in almost all lattice Boltzmann models as these models are typically only second-order accurate anyway. If we aim for fourth-order convergence rate, this error can no longer be ignored. One way to remove it [32] is to add a correction term to the equilibrium of the second-order cumulant, i.e.: Here ν denotes the kinematic viscosity. Note that we did use cumulants in normalized form which is why no density appears in (6). The term in {·} in (6) is to compensate for the aliasing of the third-order cumulant c 300 and removes the dependency of the viscosity from velocity (see Appendix H in [21] and see also [47] where the viscosity but not the phase is corrected in a BGK context). The derivative ∂ x u can be obtained from the non-equilibrium second-order cumulants which makes this correction non-invasive. The terms in [·] compensate the aliasing of c 400 and remove the phase shift. This correction depends on a second derivative ∂ x x u. Also the viscosity independent prefactor of 1/6 is rather large. Previously we used finite differences to determine ∂ x x u and computed ∂ x u from the non-equilibrium second-order cumulants. This resulted in a noticeable reduction of stability (see section 7 in [26]). One reason for this could be that (∂ x u) 2 and u∂ x x u are computed by different and possibly incompatible approaches. Both terms actually arise from the same aliasing of the fourth-order cumulant and can be rewritten as: The correction can hence be determined in different ways. For the current paper we decide to obtain the central finite difference of u∂ x u where ∂ x u is obtained from the non-equilibrium second-order cumulants. The complete algorithm including all corrections is given in "Appendix A".

Conditioning
Fully explicit numerical methods can be sensitive to the build-up of round-off errors. The kinetic approach of the lattice Boltzmann method is particularly vulnerable due to the fact that what is governed by gradients of macroscopic variables in the Navier-Stokes equation is contained in the successive hierarchy of moments or cumulants of the velocity distribution. These cumulants replace the spatial dependencies in the Navier-Stokes equations by the local dependencies on the memory of the flow. This memory composed of cumulants is distributed across all distributions which has severe implications on the conditioning of the memory terms. Cumulants of succeeding order are asymptotically smaller than cumulants of lower order. Each distribution f i jk is a composition of the cumulants of the various orders such that observable quantities of asymptotically different size share the same floating point variable. As a result, the higher-order cumulants can be poorly represented numerically, and the relevant bits they are represented by within a float or double variable may sensitively depend on the conditioning of the algebraic expressions used in a specific implementation. An additional problem is that cumulants are by no means easily computed. Obtaining them directly from the definition as the series expansion of the logarithm of the Fourier transform of the distribution function would in fact become prohibitive. An obvious way to compute cumulants is to apply the chain rule to the derivatives of the cumulant generating function to obtain expressions for cumulants as a function of the derivatives of the moment-generating function. In this way, cumulants can be computed from e.g. central moments. However, the relationships between cumulants and moments can be identified as functions of quickly growing combinatorial complexity, such that cumulants of successively higher order are not only decreasing in magnitude very quickly, but also require successively more floating point operations to be computed. This all leads to a rather poor numerical conditioning of the procedure. In fact, some early attempts on building lattice Boltzmann methods based on cumulants were only partially successful [56,57]. The feasibility to implement a cumulant lattice Boltzmann model depends on a series of measures to improve the conditioning of the cumulant transform such that the build-up of round-off errors is sufficiently suppressed. The three most important of these measures are the subtraction of the background density from the distributions, the utilization of asymptotically ordered pairwise sums in the transformation and the introduction of the Chimera transform for the computation of central moments.
The subtraction of the background density from the distribution simply removes the constant part of f i jk such that the distribution is no longer a positive function but one that varies around zero. This simple adjustment leaves more significant digits for the higher cumulants in the floating point numbers.
Pairwise summation is a method to reduce round-off errors in long sums compared to naive summation without causing any computational overhead. In the transformation from distributions to moments and vice versa, many sums have to be calculated. One simple example is the computation of the velocity in x direction which is naively done this way: This naive calculation is not ideal since the different f i jk tent to be of different magnitude and they are summed in arbitrary order. In a naive summation of n summands the round-off errors grow like O(n). This can be improved to O(log n) at the same computational cost by pairwise summation [58]. In addition the a priori knowledge of the size of the different distributions can be used in order to group them such that the partial sums are added in increasing size. The computation of the velocity then reads: Here the over bar in the index indicates a negative lattice direction. The most appropriate ordering of the different partial sums can be difficult to find. While naive summation and pairwise summation have the same computational cost but different accuracy, the Chimera transform has the advantage that it decreases both computational cost and round-off errors. The purpose of the Chimera transform is to efficiently compute central moments from distributions and back. A naive way of computing central moments κ αβγ is by: where c is the lattice speed and it is assumed that 0 0 := 1. It is faster and more accurate to split up the three sums computing Chimeras (partly distributions and partly moments) first: This transformation does not only reduce the cost of the computationally most expensive part of the collision operator, it also reduces round-off errors quite significantly. To understand this, we should first recall that the computation of moments from discrete distributions is a simplified discrete Fourier transform. The discrete Fourier transform can be substantially accelerated by splitting it into partial sums leading to the fast Fourier transform (FFT). The fast Fourier transform is called this way because its most obvious advantage over a naive discrete Fourier transform is its reduced computational complexity. In addition, it also has the perhaps even more useful property of being more accurate than the naive transform [59]. In order to understand this, we have to recall that a naive discrete Fourier transform is computed by summing up a large amount of numbers in direct succession resulting in a cumulative round-off error proportional to O(n), whereas the hierarchical summation of pairs of sums in the fast Fourier transform only leads to round-off errors growing with the logarithm of the number of summands. For the same reason the Chimera transform has smaller round-off errors than the naive moment transform as will be demonstrated with a small numerical experiment. Figure 1 shows the accumulated absolute root-mean-square error due to successive applications of the naive and the Chimera central moment transform of the distribution from a randomly chosen lattice node. Successive forward and backward transformations without the application of the collision operator are applied, and the result is  (11). For the transformation back to distributions the inverse Chimera transform was applied in both cases. While both methods show the same growth rates with repeated application of the forward and backward transformation cycle, the absolute difference in the errors remains constant at a value of 2400 which is equivalent to a relative loss of 11 binary digits only due to the transformation. This plot is shown here to emphasize the importance of the Chimera transform for the successful implementation of a cumulant method compared to the original distribution to compute the error. Note that only a single distribution function was chosen and the result could be slightly different in different cases. However, the discrepancy between the two methods is so drastic that the advantage of the Chimera transform over the naive transform becomes obvious. Both methods show a linear growth in the error with successive repetitions of the transformation such that even the Chimera transform is far from ideal and further improvements are desirable. However, compared to the naive transform the error is systematically three orders of magnitude smaller. More specifically, the loss of precision due to the naive transform as compared to the Chimera transform accounts for 11 bit in the mantissa which is close to half of the 23 bits of the mantissa of an IEEE-754 single precision floating point number [60]. It is hence obvious why the Chimera transform plays such an important role in the practical implementation of the cumulant method. The reduction in computational cost is also considerable. The naive forward transform requires a total of 1482 binary additions and 1458 binary multiplications. The Chimera transform needs only 270 binary additions and 162 multiplications. A similar comparison applies to the back-transformation.

Turbulence modelling
When simulating turbulent flows, it is often unfeasible to resolve all hydrodynamically relevant length scales down to the Kolmogorov scale. The nonlinear advection term in the Navier-Stokes equation is responsible for the transfer of kinetic energy between different wave numbers. Only wave numbers that can be resolved by the grid are captured by the method. Turbulence models are applied to capture the effect of the unresolved scales. A popular method to do this is to add an eddy viscosity, a viscosity caused by the turbulence in the unresolved scales. Large eddy simulation (LES) models add the eddy viscosity to a transient simulation which is supposed to accurately capture the dynamics of the resolved (large) eddies. Eddy viscosity models can only dissipate energy, while in reality energy might flow also from the unresolved to the resolved eddies (the so-called back-scatter). However, this is often deemed to be sufficiently rare to ignore it.
The fundamental problem in building an eddy viscosity model is that the small eddies on which the model depends are not known. The model hence has to infer the unknown from the known, i.e. from the dynamics of the large eddies. How this should be done is far from clear. Apart from the technical issue of making an eddy viscosity model work there is a philosophical issue in eddy viscosity modelling which appears to be dominating the direction of research. While most eddy viscosity models are described as "empirical", it has to be noted that academic publications and educational material usually put a strong emphasize on the positivistic aspect of the theory behind the empirical models. The positivistic perspective is that new knowledge is to be inferred from secured and perceivable facts. Arguing that turbulent viscosity emerges from the mixing of small eddies is a positivistic deduction. In this positivistic perspective at least the structure of the eddy viscosity model requires a physical interpretation in order to be legitimate. Only the model constants can be obtained by induction from experimental data (i.e. empirically). Despite its popularity, the positivistic perspective on eddy viscosity models stands on very thin ice scientifically. Among its many fault lines is the hypothesis that the unknown sub-scale turbulence can be inferred from the known large-scale motion in the first place. A more practical problem is that the positivistic perspective requires the eddy viscosity model to be deduced from physics rather than from numerics. In reverse, a sub-grid scale eddy viscosity model together with its modelling constants would have to be independent from the underlying solution method, i.e. whether the flow is solved with control volume methods, finite element methods or lattice Boltzmann methods. This, however, is mathematically implausible for the following reasons: Most contemporary sub-grid scale models are based on filters with a filter width proportional to the grid spacing. Also, the turbulent viscosity used in many such models (e.g. the Smagorinsky [61] and the WALE model [62]) depends on the shear rate and/or the vorticity in a way that does not allow for scale separation between the influence of the sub-grid scale model and the truncation error of the method. To be more specific we take the Smagorinsky model [61] as an example. The turbulent viscosity ν T is given as: The bar indicates the filtered velocity component and C is assumed to be a constant. The filter width is usually taken proportional or even equal to the grid spacing ∝ . From a numerical point of view it appears problematic to combine a turbulent viscosity of O( x 2 ) with a second-order convergent Navier-Stokes solver as is unfortunately often done. The problem is that the truncation error of the solver is of the same size as the influence of the turbulent viscosity. Considering the turbulent viscosity in the Navier-Stokes equation as a physical quantity does not make sense since, from a numerical point of view, it has to be absorbed into the truncation error. From this it is also evident that (contrary to the positivistic perspective of physically determined model constants) different discretization methods require different eddy viscosity models and/or different model constants as the effective filter is always a blend between the imposed sub-grid model and the unavoidable truncation error. From this insight arises a different concept of sub-grid scale models, the so-called implicit large eddy simulation (ILES) [63,64]. The idea behind ILES is that since there is no scale separation between the turbulence model and the truncation error, a well-behaved truncation error can be used to mimic a turbulence model. One advantage of an explicit sub-grid model over an implicit one is that the former offers some adjustable parameters through which the model can be calibrated to match experimental data. It is hence possible to do without the positivistic physical interpretation of a turbulence model and judge it only by its ability to match experimental data.
In addition to the positivistic and the purely empirical perspectives there is a third perspective, which originally played a dominant role for lattice Boltzmann simulations. Before the advent of stable collision operators the lattice Boltzmann method would usually develop instabilities when applied to flows of large Reynolds number. This problem could be solved by adding a dissipative sub-grid scale model. From this perspective, the role of a sub-grid scale model is to keep the simulation stable in the absence of sufficient resolution.

WALE model
Despite the fact that modern lattice Boltzmann models based on central moments or cumulants require no turbulence model for stabilization at very high Reynolds numbers, and despite the fact that there is no scale separation between the influence of the truncation error and the sub-grid scale model, explicit LES models have been advertised as an improvement to lattice Boltzmann models. Here we refer explicitly to the Wall Adaptive Local Eddy (WALE) model [62] which has recently become a popular additive in commercial lattice Boltzmann codes. Unlike the Smagorinsky model, the WALE model computes the turbulent viscosity not only from the shear rate but also from the vorticity. The turbulent viscosity reads: with ε being a small number avoiding division by zero and the other terms with implied summation over double indexes defined as:S The WALE model is supposed to overcome the over-damping of the Smagorinsky model in the vicinity of walls where the turbulent viscosity based on shear rate alone is unphysically high. This is traditionally mitigated by applying a damping function near the wall as introduced by van Driest [65]. However, this has limitations in complex geometries as it requires knowledge of the friction coefficient [62]. The idea behind the WALE model is hence to essentially turn the sub-grid model off at the wall.
In our implementation the vorticity and the shear rate are computed by simple central differences. The filter width is identical to the gird spacing (i. e. = x), and the constant is chosen to be C w = 0.5 in accordance with recommendations for isotropic turbulence [66,67].

Limiters
An unconventional alternative to explicit turbulence modelling is to enforce stability by applying limiters as done in flux-corrected schemes [68]. A similar approach can also be applied in the case of the lattice Boltzmann method.
In [26] it was observed that the cumulant method with optimized relaxation rates had a smaller range of stability than the regularized cumulant method. The same paper proposes a limiter that restores stability while sustaining the order of convergence. The idea of the limiter is to adjust the relaxation rate ω assigned to a scaled cumulant C by: Here C is any of the third-order cumulants specified in "Appendix A" [Eqs. (A.22), (A.23) and permutations thereof], and ω is any of their respective relaxation rates as used in Eq. (A.63) to (A.65). The parameter λ is a soft limit for the cumulant C . The limiter is ineffective as long as ρλ |C |. As |C * | gets larger, ω approaches one. In [26] stability was restored by setting λ = 0.01 for all third-order cumulants. This is an arbitrary value, but despite its heuristic motivation it leads to satisfactory results. For example, the limiter with this value was used to successfully simulate the drag crisis behind a sphere in [69]. The value λ * is always a compromise between stability and accuracy.
The limiter behaves similar to a sub-grid scale model in the technical sense that it adds dissipation locally where instabilities could emerge otherwise. It is however not based on any positivistic physical considerations. Also it has a strictly different asymptotic behaviour from an eddy viscosity model such as Smagorinsky or WALE. As explained in section 6 of [26] and confirmed in section 7 therein, the limiter is designed to add dissipation at the order of the leading error of the fourth-order convergent scheme, whereas the eddy viscosity approach adds dissipation at the leading order of the second-order convergent scheme. In signal processing terms, it can be said that the low-pass characteristic of the limiter is steeper than the low-pass characteristic of the eddy viscosity model.

Investigated models
During the evolution of the cumulant method many different variants of the model were presented. One goal of the current study is to investigate differences of these models through the numerical benchmark of a decaying three-dimensional Taylor-Green vortex. Even though classical MRT models [14][15][16]70] based on raw moments and central moment-based models [17,18,46,[71][72][73][74][75] including factorized central moment methods [76,77] are part of the heritage of the cumulant method, we abstain from including them into the current analysis. This is mostly because the deficiencies of these models have already been widely discussed (see in particular our discussion in [21]). Models considered here are the cumulant method with regularized ghost modes as published in 2015 [21] denoted by K15, the cumulant method with optimized relaxation rates for third-order cumulants published in 2017 denoted by K17, the optimized cumulant method with advection correction using three additional finite differences denoted by KF3, the latter two methods equipped with a stabilizing limiter Like K17 with WALE model Table 2 Leading error terms of the cumulant methods according to the analysis in [26,32] KSRT K15 K17 KF3 The spatial derivatives in the right column are spurious terms added to the left-hand side of the Navier-Stokes equation with the model-specific prefactors listed in the other columns. The error terms were determined in diffusive scaling but are listed here in terms of grid spacing x and time step t. The viscosity ν is given here in physical units denoted by K17L and KF3L and the K17 method equipped with a WALE sub-grid model denoted by K17W. It is also common to include a BGK variant of the lattice Boltzmann scheme in such discussions. The standard lattice BGK method is notorious for its limited range of stability and its susceptibility to grid scale oscillations which makes it an easily beaten competition. The standard BGK can be improved significantly by replacing the Taylor expanded equilibrium by the equilibrium derived from vanishing cumulants. Here we term this method BGK+.
In [26] we showed that numerical errors for simple test simulations could be improved by up to two orders of magnitude by using the modified equilibrium. In addition, numerical stability was substantially improved. In this paper we consider only the BGK+ method together with the cumulant method with a single relaxation time denoted KSRT. Both methods use the same stencil, the same relaxation rate and the same equilibrium. The nomenclature for the models under consideration is listed in Table 1. Reference implementations for the cumulant method are available for download within the open-source framework VirtualFluids [78].

Predicted errors in the cumulant lattice Boltzmann method
The leading error terms in the cumulant lattice Boltzmann method have been computed analytically by Taylor expansion [26] and through aliasing relations of the asymptotic expansion [32] for diffusion and advection errors, respectively. The current paper is focused on numerical rather than analytical assessment of the methods.
Here we only compute the respective error terms by inserting the parameters of the respective cumulant methods and list them in Table 2. Without loss of generality all error terms are listed only for the Navier-Stokes equation governing the evolution of u (i.e. for the x-velocity). Error terms in the remaining directions can be obtained by exchanging the indices. The leading error terms comprise various spatial derivatives of velocity and their prefactors which dependent on the model parameters. Table 2 provides some insight in the expected behaviour of the different methods. While all errors in Table 2 are second order in diffusive scaling (i.e. t ∝ x 2 ), they are of varying orders in acoustic scaling ( t ∝ x). The fact that the grid spacing x appears in the denominator indicates divergence of the method for x → 0 if t = const which is a manifestation of the diffusive CFL condition. While it is generally accepted for a numerical method that the grid spacing cannot be made arbitrary small without making the time step small too, it would be desired that at least the time step could be made arbitrary small at fixed grid spacing. This, however, is not the case for the lattice Boltzmann method with fixed relaxation rates for non-hydrodynamic moments [79] of which the K15 method is an example. From the table the reason for this unfavourable behaviour becomes clear: Some of the error terms of the K15 method have t in the denominator. In diffusive scaling this is compensated through the x 4 in the nominator, but at fixed x the method will obviously diverge for vanishing t. It is seen that the leading-order error in the KF3 method does not disappear entirely as already discussed in [26]. The fourth-order convergent cumulant method is hence not strictly fourth-order convergent in an orthodox sense. However, the appearance of the viscosity taken to the cube renders the remaining error essentially non-existing for simulations of turbulent flow where ν is small.

Three-dimensional Taylor-Green vortex
The Taylor-Green vortex is a synthetic flow field that fulfils the time-dependent incompressible Navier-Stokes equations. In two dimensions the vortex is stable and keeps its shape, while dissipating energy. This is due to the fact that vorticity in two dimensions becomes a scalar which cannot be generated by the nonlinear term in the (two-dimensional) Navier-Stokes equation [80].
In three dimensions vortex stretching occurs and energy is transported from large scales to smaller scales, such that new small-scale vortices appear, while the base vortex decays. Since this is a cascading process, the flow transitions to turbulence. An exact initial condition can be specified, which allows comparisons of different solution methods.
The Taylor-Green vortex benchmark has been used in connection with many numerical methods, often considering the performance of the methods for under-resolved flows. Diosady and Murman [81] used the Taylor-Green vortex to measure the computational cost associated with obtaining a given level of accuracy for discontinuous-Galerkin finite element methods with different orders of the ansatz functions. They found that a sixteenth-order spatial discretization with only 48 3 elements had a much better accuracy than a second-order method with 256 3 elements and had a three orders of magnitude lower computational cost. Flad et al. [82] introduced an adaptive filtering technique for high-order discontinuous-Galerkin methods in order to reduce aliasing artefacts and used the under-resolved Taylor-Green vortex simulations to assess the performance of their method. Bull and Jameson used the Taylor-Green vortex to investigate the performance of a high-order flux reconstruction scheme and found that using high-order ansatz functions can lead to oscillations [83]. Piatkowski et al. used the Taylor-Green vortex test case to benchmark their discontinuous-Galerkin-based splitting method [84]. Wiart and Hillewaert used the Taylor-Green benchmark on the discontinuous-Galerkin solver Argo using different kinds of meshes [85]. Kulikov and Son used the Taylor-Green test for assessing the performance of the CABARET scheme for under-resolved simulations [86]. Lee et al. used the benchmark for their low-dissipation solver based on OpenFOAM [87]. The same benchmark was also used in several studies using different variants of the lattice Boltzmann scheme [88][89][90], the semi-Lagrangian off-lattice Boltzmann method [3], the lattice kinetic scheme [91] and GKS [92,93].
The test case of a three-dimensional Taylor-Green vortex at fixed Reynolds number is chosen here for the following reasons: • The test case is well studied, and reference solutions for energy decay, enstrophy evolution and energy spectra are readily available from the literature. Also it is possible to compare our results to results obtained with other methods including results not yet published as the test case is rather popular. • The specific Reynolds number of 1600 is chosen as it is a case for which reference data are not only available for the increasing enstrophy but also for decaying enstrophy. In addition, the Reynolds number of 1600 is the highest for which single relaxation time models provide a stable solution using feasible resolutions.
• Studying a single test case allows a dense sampling of different simulation parameters.
For our investigations we follow the setup of Wang et al. [94] which was also used by Jacobs et al. [95]. The flow is solved in a periodic domain −π L < x 1 < π L, −π L < x 2 < π L and −π L < x 3 < π L, where L is a reference length. For incompressible flow the physical parameters are defined by the Reynolds number, which is chosen as The initial flow field is given by: The initial pressure field is set implicitly via the density field. The evolution of the flow field is observed for 20t * , where t * = L/U is the reference time. We define a base time step t 0 for the simulations which corresponds to t * = 250 t 0 at a spatial resolution of 64 3 Table 3 together with the velocity in lattice unit and the approximate Mach number.

Numerical results
Here we use the three-dimensional Taylor-Green vortex to study the performance of the cumulant lattice Boltzmann model. The results from Wang [94] are taken as a reference solution which we consider to be sufficiently accurate for the purpose of this study. We investigate the influence of various parameters and modelling options on the performance of the simulation.
To this end, we primarily observe the time evolution of integral kinetic energy and integral enstrophy. The time evolution of the integral kinetic energy characterizes the dissipation of the flow. It is calculated by with being the volume of the computational domain, i.e. = (2π L) 3 . The integral enstrophy E (which is proportional to the integral energy dissipation rate ), is a measure for the dissipation of kinetic energy and can be computed in two different ways which should yield identical results for incompressible flow [94]. First, the enstrophy is proportional to the time derivative of the kinetic energy [96], i.e.
Second, the enstrophy can be computed locally as the square of the vorticity ω = ∇ × u [94,95], such that the integral enstrophy is given by The former relation is strongly coupled to the dynamic process of dissipation, whereas the latter is purely kinematic. Indeed the latter is a measure for how strong small-scale structures are represented in the flow field. For spatially discretized flow simulations, this is an important measure, since some numerical methods tend to smear out small-scale structures on the grid scale.

Integral kinetic energy decay and enstrophy decay
As a first test we investigate the decay of the integral kinetic energy and the enstrophy. Unless otherwise stated we obtain the enstrophy from computing the vorticity with eighth-order accurate finite differences. Figure 2 shows the decay of kinetic energy for several lattice Boltzmann models. Figure 3 shows the corresponding enstrophy evolution.
It is observed from Figs. 2 and 3 that the scaling (diffusive or acoustic) is of minor relevance for both the energy and the enstrophy. At all resolutions the BGK+ methods follow the decay of kinetic energy and the growth of enstrophy of the reference closely in the beginning. The coarse resolution becomes unstable after t * ∼ 10 irrespective of the Mach number. As could be expected, we observe that the coarse grid simulations underestimate the peak enstrophy quite considerably. Enstrophy is dominated by the smallest resolved scales and the coarse simulation cannot resolve all scales. This under-prediction of enstrophy in under-resolved simulations is common to all methods. This makes enstrophy a measure of the minimum feature scale that can be resolved by a given method for a given grid resolution.
It is quite interesting to compare the BGK+ simulation to the KSRT simulation which shares the same lattice, the same equilibrium and the same single relaxation time. The two cases with the higher resolution show very similar results. However, the KSRT method remains stable also for the coarse simulation. This is interesting, as it could indicate that the stability of the cumulant method does not only originate from the equilibrium and the damping of higher-order modes but it also persists for the single relaxation time version. However, one should be careful to draw conclusion from a single observation.
We note here that Nathen et al. studied the same test case with a standard BGK model and 19 instead of 27 velocities [88]. They also observed instability for the BGK model at a spatial resolution of 64 3 lattice nodes. However, in their case the instability set in much earlier than in our case, indicating that the BGK+ method is more stable than the regular BGK method.
Next we test the cumulant lattice Boltzmann method in regularization mode, i.e. by setting all relaxation rates irrelevant for shear viscosity to one. This is the set of relaxation rates used in [21] and denoted here as K15 in Figs. 2 and 3. We observe that the deviation from the reference simulation is much larger for K15 than for the KSRT and BGK+ simulations. This is obviously due to a spurious damping which is seen by the rather quick decrease in integral kinetic energy and a lower peak in enstrophy than observed in the single relaxation time methods. Unlike the single relaxation time methods, the regularized cumulant method shows a notable dependency on the Mach number. We also point out that the deviation from the reference increases with decreasing Mach number which is an undesired effect as it implies that the method becomes less accurate if a smaller time step is used. The reason for this behaviour is clearly seen from the leading error term as given in Table 2. Several of the spurious fourth-order spatial derivatives of velocity have prefactors of the form O( x 4 / t) which are second order in diffusive scaling, but divergent if the time step is reduced for fixed spatial resolution. This effect is well known to exist in lattice Boltzmann methods with multiple relaxation times and is not particularly related to cumulants. It was first described by Dellar [79] for a multiple relaxation time method using a raw moment basis and was further discussed in the context of cumulants in [26].
The Mach number dependence of the regularized cumulant method is easily overcome by the optimal relaxation rates derived in [26] and denoted as K17 in Figs. 2 and 3. This is also evident from the absence of any error terms with t in the denominator for the K17 method in Table 2. The optimal relaxation rates were obtained in [26] from a linearized asymptotic expansion. They increase the convergence order of the diffusive term in the Navier-Stokes equation provided that the Reynolds number is sufficiently large so that ν 3 ∼ 0. It is not self-evident that this should also increase the accuracy of the method in the under-resolved case as a higher Here, however, we see that the optimized cumulant method deviates less from the reference solution across all studied resolutions than the KSRT method. This, in fact, indicates that setting all rates to the same value is not the best choice and it is also not necessary for the elimination of the dependence on the Mach number. We also note that the deviations from the reference solution for the coarser simulations are quite similar to the ones observed in the KSRT method. However, at the highest resolution the correspondence between the solution obtained by the optimized cumulant method and the reference solution is significantly closer than that between the highest resolution KSRT simulation and the reference, as should indeed be expected from an increased order of convergence. When comparing the enstrophies resulting from the simulations based on the regularized cumulant method K15 to the ones of the optimized method K17, we note that the optimized method with resolution 64 × 64 × 64 shows a striking correspondence to the regularized cumulant method with resolution 128 × 128 × 128. The same can be said for the optimized method with resolution 128 × 128 × 128 and the regularized method with resolution 256 × 256 × 256. The results for the cumulant method with optimal parametrization and advection correction (KF3) are also shown. They look quite similar to those from the K17 method, indicating that the remaining error is not dominated by advection in the current setup.
In Figs. 2 and 3 we also show results for the cumulant method enhanced by the WALE sub-grid model denoted as K17W. All relaxation rates were set according to the optimal set of relaxation rates (i.e. K17). It is observed that the WALE model behaves very similar to the regularized cumulant method during the early onset of the decay of integral kinetic energy and considering the reduced peaks in the enstrophy. However, the WALE model is insensitive to changes in the Mach number, a property inherited from the K17 method. Here, the similarity with the results from the optimized cumulant method without sub-grid scale model (K17) at halve the resolution is even more striking. The effect of adding a WALE model to the optimized cumulant method appears to result in reducing the effective resolution by a factor of two. This does not come as a surprise, as it is the very purpose of the sub-grid model to filter out eddies at the filter scale which also coincides with the grid spacing. Thus, a sub-grid scale model that reduces the effective resolution by a factor of about two does exactly what can be expected. Whether this is actually useful for efficient simulations is of course a different question.
In Fig. 4 energy and enstrophy evolution for the K17L and KF3L method using a limiter for the thirdorder cumulants are shown. The purpose of the limiter is to enhance stability which is not necessary in the current setup. The results indicate some additional damping coming from the limiter, in particular at the coarser resolutions. It is noteworthy that the high-resolution case is only very weakly affected.
When comparing the models with limiters (K17L and KF3L) and the one using a turbulence model (K17W) with the unmodified models, it is important to note that the limiter and the turbulence model are parameterized and the result would change by choosing a different parameter. However, we note that the WALE model adjusts the viscosity at the order of the leading error of the second-order scheme, whereas the limiter acts at the leading error of the fourth-order scheme. It is therefore to be expected that the limiter shows less effect at low wave numbers compared to the WALE model, as will be demonstrated below.
So far we computed the enstrophy using Eq. (22) but as discussed above the enstrophy can also be computed from the dissipation rate according to Eq. (21). Here the dissipation rates are computed by central differencing  21)] is seen to be essentially insensitive to the resolution for all considered methods. The discrepancy between the enstrophy and the dissipation rate will in the following be used for determining an effective viscosity.

Energy balance and effective viscosity
In order to gain quantitative insight into the accuracy to which our under-resolved simulations capture the actual physics of the problem, we compute the effective viscosity. The effective viscosity can be computed using the balance equation of kinetic energy [97,98]. The dissipation of kinetic energy E is balanced by the enstrophy as [96]: The time evolution of the kinetic energy (Fig. 2) is better captured than the enstrophy (Fig. 3) which is biased towards the smallest resolved scales. Figures 7 and 8 show the effective viscosity of the lattice Boltzmann schemes over time. The time derivative of the kinetic energy was obtained by central differences using a time increment of 20 time steps. We observe for all cases that the effective viscosity approaches the nominal viscosity when the resolution is sufficiently large and that the performance of the K15 method deteriorates for smaller time steps at fixed spatial resolution. A distinctive feature only seen in the WALE simulation is that the effective viscosity for the lowest resolution deviates from the nominal value even in the beginning of the simulation. Comparing the results for K17 and KF3 with K17L and KF3L in Figs. 7 and 8 also provides an indication of the effect of the advection correction. While the methods show a similar behaviour at later times, it is seen that the models with advection correction capture the nominal viscosity much better in the initial phase of the simulation, in particular when a coarse grid is used.

Energy spectra
In Figs. 10 and 11 we display the energy spectra of the investigated methods. We only show spectra at t = 10t * which is briefly after the enstrophy reaches its maximum over time. At a Reynolds number of 1600 the flow is not highly turbulent such that the Kolmogorov E ∼ k −5/3 behaviour is observed only in a limited range of wave numbers. Using the enstrophy of the reference simulation at t = 10t * , which is E ≈ 9(t * ) −2 , we can estimate the Kolmogorov length η using the initial Reynolds number to substitute the viscosity: As the domain measures 2π L in each direction, the Kolmogorov length measures 0.12 x, 0.24 x and 0.49 x for the 64 × 64 × 64, the 128 × 128 × 128 and 256 × 256 × 256 resolutions, respectively. We hence see that none of the meshes reaches DNS quality albeit they are also not under-resolved by more than an order of magnitude. Figure 10 shows the results for the methods K17, KF3, K17L, K17W and K15 computed with the MATLAB code provided by Felix Dietzsch [99]. The energy spectrum is computed by integrating over shells of equal wave number [100]. The results are in excellent agreement with those from Foti and Duraisamy [101] showing data at the same time step as we do. The results from Foti and Duraisamy were digitized from a low-resolution figure in [101] and plotted here only for k ≤ 128/(2π L), while in the source they were plotted for k ≤ 128 √ (3)/(2π L). It appears to be common in literature [101,102] to plot the energy spectrum up to the highest theoretically available wave number 2 √ 3/ x. We however decided to plot results only up to 2/ x due to the reasoning illustrated in Fig. 9. Due to the sampling theorem for real-valued discrete data sets, relevant information is only contained up to k = 2/ x in each Cartesian direction of the  Sketch of the energy spectrum in Fourier space for two different resolutions. Only two dimensions are shown for clarity. Due to the sampling theorem, information is only contained in the lower quarter (octant in 3D) of the spectrum. The energy E(|k|) is obtained by integrating over spherical shells. It is common in the literature to show results integrated up to k max = 2 √ D/ x. However, we plot them only below 2/ x due to the fact that this is the highest wave number for which a complete energy shell is present. Plotting results of the low-resolution simulation with x 64 alongside results of the high-resolution simulation with x 128 at the depicted k would be misleading since the two results would be shown for different portions of the energy shell and would hence not represent the same physics Fourier transform. The highest wave number is therefore k max = 2 √ D/ x with D = 3 being the number of dimensions. However, a complete isotropic energy shell is only available up to k max Shell = 2/ x as can be clearly seen in Fig. 9. Plotting results of a low-resolution simulation next to a result of a high-resolution one beyond 2/ x low Resolution is therefore misleading as the low-resolution result would only be integrated over a partial energy shell, whereas the high-resolution result would be integrated over a full energy shell.
For the K17, KF3 and K17L methods at the highest resolution there is essentially no deviation from the spectral results up to a wave number of 2π Lk ≈ 80. The KF3 method, which has the highest theoretical accuracy among the tested methods, also shows the highest consistency between results of different resolution albeit the difference to the K17 method is not large. The limiter in the K17L method acts like an eddy-viscosity turbulence model by adding dissipation to higher wave numbers. This is similar to the explicit sub-grid scale model in the K17W method which dissipates the higher wave numbers to a larger extent than the K17L method. A quite significant difference between the limiter (K17L) and the sub-grid scale model (K17W) is in the steepness at which the spectra deviate from the reference. Even though both methods drop to comparable values at the highest resolved wave number, the K17L method is seen to drop much steeper. This indicates that the limiter affects only the high wave number frequencies, while the sub-grid scale model also affects intermediate wave numbers. This behaviour is to be expected as the two models act on different asymptotic orders. The dissipation added by the limiter was designed [26] not to compromise the fourth-order convergence of the cumulant method, whereas the sub-grid model is designed to add an eddy viscosity proportional to O( x 2 ). The dissipation added by the limiter is hence two asymptotic orders smaller (i.e. O( x 4 )) than the eddy viscosity in the K17W method. This naturally results in a much steeper fall-off irrespective of the used constants in the two methods (which are not related to each other).
The K15 method behaves differently from the other lattice Boltzmann kernels in two aspects: While the methods K17, K17L, K17W, KSRT and BGK+ are not sensitive to the time step size, K15 appears to be very sensitive. Secondly, the method displays large dissipation at higher wave number. Dissipation becomes stronger with shorter time steps. The steep drop in kinetic energy for higher wave numbers in the K15 method is similar to the one observed in [91] for the lattice kinetic scheme which is also a regularized lattice Boltzmann method.
The BGK+ and KSRT methods also show good agreement with the spectral reference in the low wave number regime (see Fig. 11). In the high wave number regime, the disagreement with the reference is larger than for the K17 and KF3 methods. The BGK+ method for the lowest resolution is the only case where an over-prediction of the energy is observed, but this is due to the onset of instability leading to complete numerical divergence at later time. In general, the BGK+ and KSRT methods are characterized by low dissipation, which is why they match part of the high-frequency domain of the spectrum better than the K17L and K17W methods. However, at very high frequencies the mismatch grows more rapidly than for the K17L method.

Comparison to regularized and KBC lattice Boltzmann models
In a recent paper Krämer et al. [89] used the Taylor-Green vortex test case for the investigation of a regularized LBM method (RLBM) derived from a pseudoentropy maximization and the KBC method named after Karlin, Bösch and Chikatamarla [24]. Both methods apply regularization to non-hydrodynamic moments in order to improve the stability of the lattice Boltzmann scheme. We contrast this approach with the results of the K17 and K17L methods which use optimized relaxation meant to improve accuracy rather than stability. The limiter in the K17 method also stabilizes the method at higher Reynolds numbers. We note that this is not necessary in the current case, but it is shown here for fairness as the other two methods are explicitly designed for maximum stability. We use the original data from [89] which was computed on a grid with 80 3 nodes and compare them to our results for the same resolution. The advection correction (KF3) is not used here since it requires finite differences and would render the comparison unfair. The methods compared here all share the same D3Q27 lattice without any add-ons. The energy spectra at t = 10t * are shown in Fig. 12. We note that the methods agree well for low wave numbers, but that both the RLBM and the KBC method deviate further from the reference at high wave numbers than the K17 and the K17L method. At the highest resolved wave number the limited K17L method matches up with the KBC method. However, this is due to the KBC method flattening out at high wave numbers.

DNS results
So far all shown results were obtained on under-resolved grids and a small but noticeable residual deviation of the peak enstrophy could even be seen in the K17 and KF3 method. To demonstrate that the cumulant method recovers the correct energy spectrum and the correct enstrophy evolution we repeated the K17 simulation and the K17L simulation on a grid with 512 3 nodes. From Fig. 13 it is evident that, when run at DNS resolution, the reference spectra, the dissipation rate and the enstrophy are faithfully recovered. In particular, the limiter shows no effect when the method is applied at DNS resolution.

Vortical structures
To gain a qualitative overview we also display vortical structures by showing contours of the Q-criterion [103]. For better comparison we restrict us here to the KSRT, K17, K17W and KF3 methods with medium time step and group equal time instances together. At t = 5t * (Fig. 14) the methods agree well and the resolution has only a minor effect as small vortices have not yet developed. The KSRT method exhibits some oscillations at the lower resolutions, while the other methods are essentially smooth. At t = 10t * (Fig. 15) a noticeable deviation of the flow patterns between the methods can be appreciated for the low-resolution cases. In particular, the KSRT method shows essentially noise. At later times and with developing turbulence, the differences between the results become more pronounced (see Figs. 16, 17).  Comparison between the K17 and K17L method with two other recent lattice Boltzmann methods: the regularized LBM (RLBM) and the KBC method. The data for the latter two methods were supplied by the authors of [89]. The RLBM and KBC methods were designed to maximize stability, while the K17 method maximizes accuracy. It is seen that the K17 and the K17L methods deviate considerably less from the reference solution than the other two methods in the high wave number regime  The standard Taylor-Green test case discussed in the previous sections has a comparatively low Reynolds number. In this section we investigate the performance of the cumulant method at higher Reynolds numbers, i.e. 16,000 and 160,000. The usage of a limiter or a sub-grid model becomes necessary at such high Reynold numbers for stability reasons. Figure 18 shows the energy spectrum for the two Reynolds numbers simulated with the K17L method. While no reference solution can be provided, we find good agreement with the ideal k −5/3 law. At larger resolution we observe a deviation from this ideal behaviour. There is a build-up of energy at high wave numbers preceded by a region of reduced energy level at intermediate scales. The same behaviour was observed for under-resolved DNS of an inviscid Taylor-Green vortex simulated with the discontinuous-Galerkin method [104,105]. In the discontinuous-Galerkin method the choice of the flux computation has a strong effect on the shape of the energy spectrum at higher wave numbers. Lax-Friedrichs methods appear to show a larger bottleneck effect (dip in energy) than Roe's scheme [104]. However, according to Flad and Gassner [106] neither choice provides the means to substantially increase the fidelity of the method in the sense of an implicit LES. Some kinetic energy preserving discontinuous-Galerkin schemes behave slightly different at very high Reynolds numbers, only showing a build-up of energy at higher wave numbers without the dip in the intermediate range [82,106]. Inviscid simulations such as those shown in [104,105] assume an infinite Reynolds number for which the ideal k −5/3 -law should hold for arbitrary high Reynolds numbers. At finite Reynolds numbers deviations from the k −5/3 -law are to be expected, and a bottleneck effect is also seen in experiments [107]. However, it is difficult to say whether the appearance of the bottleneck effect in underresolved simulations bears any physical meaning. Frisch et al. investigated the build-up of energy at high wave numbers when explicit hyperviscosity is introduced [108]. They explain the presence of the energy bottleneck as an effective increase in turbulent viscosity originating from the energy bump at even higher wave numbers. Since the small scales contain more energy than predicted by the Kolmogorov theory, the intermediate scales are mixed more strongly than predicted by Kolmogorov's theory such that they dissipate more quickly. Figure 19 depicts the effective viscosity for the two simulations. It is seen that the two coarsest resolutions follow the nominal viscosity only in the beginning of the simulation. Later on they show effective viscosities independent of the nominal viscosity. Only the highest resolution simulation is obviously affected by the nominal viscosity.
We repeat the simulation at the higher Reynolds number with the KF3L and the K17W method and show the results in Figs. 20 and 21. Even though a better agreement between the different resolutions is observed for the KF3L method as compared to the K17W method, the deviation from the ideal k −5/3 is still observed. Results from the K17W method with its explicit turbulence model agree better with Kolmogorov theory even though a small bottleneck effect is still visible. Quite interestingly, it is observed that adding explicit turbulent viscosity in the K17W method enhances the energy in the intermediate range compared to not using an explicit turbulence model.
The energy bottleneck is obviously influenced by the insufficient resolution and the mismatch between the nominal and the effective viscosity. Still it is instructive to briefly discuss the physical bottleneck effect. Falkovich [109] explains that the presence of molecular viscosity increases turbulence levels by inhibiting turbulent transfer. Donzis and Sreenivasan argue that the amplitude of the bottleneck effect decreases with increasing Reynold number such that the effect is not visible from observations in atmospheric flows which have extremely high Reynolds number [110]. In laboratory experiments the effect is difficult to study as it requires sufficiently high Reynolds numbers and is visible only for extremely high wave numbers. To overcome the experimental difficulties, Küchler et al. [107] employed a very sophisticated experimental setup using sulphurhexaflouride at pressures of up to 15 bar in order to obtain Taylor micro-scale Reynolds numbers of up to 1600 and Kolmogorov scales of ten microns. This provides sufficient measurement accuracy to investigate the bottleneck effect. According to a theoretical investigation by Verma and Donzis [111] the bottleneck effect is suppressed if the inertial range spans more than four decades. This obviously also requires the resolution to span four decades. Our finest grid supports only a spectrum spanning two decades. As our simulations do not reach DNS quality, the location and strength of the bottleneck is obviously influenced by the effective viscosity such that we cannot claim that the observation of a bottleneck in Fig. 20 was physically correct. While the bottleneck effect is a physical reality for turbulent flows with inertial ranges larger than one decade but smaller than four decades, the strength and location of the bottleneck is most likely not accurately represented in our simulations due to the unphysical cut-off of the spectrum at resolutions below DNS quality.

Conclusion
In this paper we presented a comprehensive study of the performance of cumulant-based lattice Boltzmann schemes using the three-dimensional Taylor-Green Vortex benchmark. Several other lattice Boltzmann model variants have also been considered for comparison. The improved BGK+ method was found to be less stable than the single relaxation time cumulant method. We should however mention here that the difference in stability is not large. A preliminary search for the highest attainable Reynolds number at resolution 64 × 64 × 64 gave Re = 1529 for the BGK+ and Re = 1980 for the KSRT at the highest used Mach number. For the K17 and KF3 method a Reynolds number of 3000 could be achieved. These values are preliminary in the sense that according to our observation simulations that became unstable at a low Reynolds number could stay stable at a higher Reynolds number. In the literature it is often assumed that there is a certain cut-off Reynolds number above which all simulations would become unstable. While this assumption appears to be reasonable, we do not know of any proof for it and actually observed several counter examples. One counter example was also observed by Nathen et al [88] where a MRT model was seen to decrease in stability when the resolution was  Fig. 18 Energy spectra for Reynolds numbers 16,000 (left) and 160,000 (right) using the K17L method. Some deviation from the ideal k −5/3 can be observed in a slight build-up of energy at higher wave numbers increased. We therefore prefer to be very cautious with giving explicit limits for stability. For the current setup, all but the BGK+ method were stable for all test cases. Limiters or a sub-grid scale model were hence not required for stability as long as the Reynolds number was low enough (i.e. Re = 1600). We note here that both the sub-grid scale model and the limiter drastically improve the stability. However, stability was not the concern in the presented study and stability should also not be the primary motivation for the application of a sub-grid scale model, that is to say at least not, if the sub-grid scale model is motivated by physics-based considerations to incorporate the influence of unresolved scales on the larger scales. The observation of this   study is that the WALE model successfully filters out high wave numbers from the simulation. This results in an under-prediction of the high wave number part of the energy spectrum and a general under-prediction of the enstrophy. All of this is expected from a sub-grid scale model. For Reynolds number 1600, omitting the WALE model systematically improves the correspondence between the simulations and reference results even for the lowest resolution which is clearly under-resolved. At this Reynolds number the application of a turbulence model appears to be neither necessary nor helpful to increase both accuracy and efficiency. We saw that applying a limiter to the third-order cumulant leads to a much steeper cut-off in the Kolmogorov spectrum than using the WALE model. This is also obvious from the different asymptotics. While the WALE model adds a turbulent viscosity of the order of O( x 2 ) in diffusive scaling, the limiter was designed not to reduce the fourth-order accuracy of the K17 method and hence becomes significant at much higher wave numbers than the turbulent viscosity of the WALE model. On top of this, the limiter is a local operation with virtually zero computational cost, while the WALE model depends on costly computations of vorticity. However, at substantially higher Reynolds number the picture changes. At a Reynolds number of 160,000 we observe that the K17L and KF3L methods show a deviation from the ideal Kolmogorov spectrum at high enough wave numbers, showing an energy bottleneck followed by an energy bump at sufficiently high wave numbers. To our knowledge this is the first time that such a behaviour is observed in a lattice Boltzmann simulation.
We also confirmed previous results showing that regularization in the lattice Boltzmann context leads to a significant dependency on the time step which is just the opposite of what is expected from a good numerical scheme: Results generally get worse with shorter time steps. In contrast to the regularized K15 method, no dependence on the time step was observed for the optimized K17 and KF3 methods.
In conclusion, we observed that the cumulant lattice Boltzmann method simulated the Taylor-Green vortex successfully at moderate levels of under-resolution. These results are naturally limited to isotropic turbulence at moderate Reynolds numbers. The observation of an energy bottleneck at very high Reynolds numbers in the cumulant method is interesting and requires further investigations. We note that the energy bottleneck is a physical reality which has also been observed in recent experiments. However, since our simulation does not reach DNS quality for the Reynolds numbers in question, we do not claim that the observed bottleneck effect is physical. As the physical bottleneck effect is assumed to disappear at extremely large Reynolds numbers, the numerical bottleneck effect could also be regarded as an artefact that has to be mitigated by the use of an explicit turbulence model. In this sense, the question whether adding a WALE model to the cumulant method is advantageous is still open due to different interpretations and focuses.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Funding Open Access funding enabled and organized by Projekt DEAL.

Appendix A: KF3L cumulant kernel
This appendix lists the KF3L collision operator for the convenience of the reader. The KF3, K17L, K17 and K15 operators can also be obtained from the following by omitting the corresponding corrections as explained at the end of this Appendix.
Appendix A.1: Parametrization for fourth-order convergent diffusion For obtaining fourth-order convergence for the diffusion term, a specific parametrization has to be chosen [26] composed of the relaxation rates for third-order cumulants ω 3 , ω 4 and ω 5 and two parameters A and B to be introduced in the equilibrium of fourth-order cumulants: .
Appendix A.2: Forward central moment transformation We define the well-conditioned distribution functions f i jk by subtracting the background density: with w 000 = 8/27, w 100 = 2/27, w 110 = 1/54, w 111 = 1/216 and so on by permuting the indices. The fast central moment or Chimera transform uses a divide-and-conquer approach to minimize both the number of operations and the influence of round-off errors. In order to apply the Chimera transform to the well-conditioned distributions certain constants have to be derived from the constant background distributions w i jk : With this central moments can be computed from: Here we write indices in Miller notation with1 = −1. Compared to the earlier publications of the Chimera transform [21,26] we include the inverse of the constants , i.e. K −1 i j|0 , K −1 i|0γ and K −1 0βγ . This appears to be an unnecessary increase in floating point operations, but it serves a certain purpose. The constants K i j|0 , K i|0γ and K 0βγ are primarily multiples of 1/3. These fractions have the disadvantage of not being exactly represented in floating point arithmetic. A detailed investigation on the original Chimera transform as in [26] has revealed that the unavoidable round-off errors coming from the inexact representation of the constants tent to be dominant in a certain direction. This is most likely due to the fact that the Chimera transform has to be performed in a certain order. In the presented form, the transform processes the z-direction first, followed by the y-direction and ending with the x-direction. Algebraically, the order of the directions should not matter, but the inexact constants lead to a directional bias of the round-off errors. In double-precision calculations this difference is very small, but in single-precision calculations this directional bias can lead to noticeable unisotropies in the final result. One way to avoid this problem is to multiply all terms in the sum which are not multiplied by the constant by the inverse of the constant and then multiply the entire sum by the constant. By doing so, the inverse of the constant should be accurately precomputed. For example (1/3) −1 → 3. the inverse of all constants used in the transformation can be represented exactly in floating point arithmetic. It is of note that this measure does not decrease the magnitude of the round-off errors. It is meant to avoid the unisotropy observed in the original version of the Chimera transform.

Appendix A.3: Forward cumulant transformation
The non-conserved cumulants c αβγ are computed from the central moments. There is no simple general equation for the computation of cumulants (see [21] for the theory). Since the equilibrium of most cumulants is zero, the normalization is omitted in the following and we define: In what follows, we omit equations for cumulants which can be obtained by permuting indices in the listed equations. The first few cumulants are seen to be identical to the first few central moments: The asterisk indicates the post-collision value of any quantity. Due to the anisotropy of the underlying lattice, it is necessary to add a few correction terms to the equilibrium. These terms depend on the knowledge of the first-order velocity gradients ∂ x u, ∂ y v and ∂ z w which can be approximated by D x u, D y v and D z w computed from second-order cumulants in the following way: The advection correction in the KF3 method requires additional derivatives that cannot be easily computed from the distribution function f i jk [32]. There are different possibilities to compute these derivatives. Here we choose to introduce an additional distribution function g i jk with six discrete velocities from which the following derivatives are computed: The distributions g i jk undergo the same streaming algorithm as the distributions f i jk which means that no non-local operations are introduced into the lattice Boltzmann algorithm other than the usual streaming step and the EsoTwist algorithm for streaming can be applied [112]. However, a different way of computing finite differences might also be used.
The following cumulants have nonzero equilibrium due to the mentioned anisotropy of the discretization and to incorporate the advection correction: The equilibria for all remaining cumulants are zero in the original cumulant method [21]. In [26] some of the equilibria have been modified to obtain fourth-order convergence of the diffusion term: and so on. The limit λ is an empirical parameter used to keep the growth in the third-order cumulants bounded. For λ → 0 the regularized cumulant method K15 is recovered if at the same time A = B = 0. For λ → ∞ the limiter is turned off. In the simulations shown in this paper we used λ 3 = λ 4 = λ 5 = 10 −2 for the K17L and KF3L simulations and λ 3 = λ 4 = λ 5 = 10 6 for the K17 and KF3 simulations. Both values are arbitrary and we do not imply that they are optimal choices. In particular, we do not imply that taking the same value for all cumulants was a good choice. The values chosen here are the same as in [69] where they were found to lead to good stability for a wide range of simulations. Increasing λ makes the limiter weaker. Larger values improve accuracy, while smaller values improve stability. In [113] "Appendix B" the limiter for under-resolved turbulent channel flow was selected as high as possible while keeping the method stable. This resulted in a Reynold number-dependent λ which was typically one order of magnitude larger than the value chosen here. The original cumulant method [21] is recovered by setting A and B equal to zero. In this appendix we demonstrate a certain pitfall that has to be taken into account when investigating the production and decay of enstrophy. As we mentioned earlier, the interesting property of the enstrophy in a turbulent flow is that it is dominated by the smallest eddies and can therefore be used as a measure of how much of the turbulence is resolved. For an under-resolved simulation this implies that the dominant frequencies are located at or close to the grid spacing wave number which in turn implies that the measurement of the enstrophy can be very sensitive to the method by which the enstrophy is computed. The naive way of computing enstrophy is to use a second-order central difference operator for the computation of the vorticity. That would, of course, not be compatible with a method of fourth or higher order of convergence as the method by which the scheme would be evaluated would be inferior in accuracy to the scheme itself. In order to demonstrate the influence on the finite difference stencils we compare the eighth-order finite difference stencil used in the previous simulations to the naive second-order convergent stencil in Fig. 22. It is observed that a significant amount of enstrophy is lost when utilizing the low-order approach to evaluate enstrophy. We emphasize here that the vorticity is computed with the naive method in the K17W model for determining the eddy viscosity as everything else would be too expensive.

Appendix C: Influence of using double or single precision
In this appendix we study the influence of numerical precision by comparing lattice Boltzmann simulations in single (32 bit) and double (64 bit) floating point precision. We use the optimized parameter version without advection correction. The results in Fig. 23 indicate a visible influence of the numerical accuracy on the results. In particular, it is observed that the integral kinetic energy shows a non-physical rise at the beginning of the simulation. At the highest resolution the enstrophy for the lattice Boltzmann method in single precision is slightly larger than the reference value which did not happen in any of the other simulations. It is hence seen that numerical problems in the computation of cumulants still remain even after the applied measures to improve the conditioning as discussed above. However, it can also be noted that even though there is a notable influence of the floating point precision on the result, even at single precision the cumulant method with optimized relaxation rates remains stable.