Towards the Suitability of Information Entropy as an LES Quality Indicator

The Shannon entropy is a rigorous measure to evaluate the complexity in dynamical systems. Shannon entropy can be directly calculated from any set of experimental or numerical data and yields the uncertainty of a given dataset. Originating from information theory, the concept can be generalized from assessing the uncertainty in a message to any dynamical system. Following the concept of ergodicity, turbulence forms another class of dynamical systems, which is generally assessed using statistical measures. The quantification of resolution quality is a crucial aspect in assessing turbulent-flow simulations. While a vast variety of statistical measures for the evaluation of resolution is available, measures closer representing the dynamics of a turbulent systems, such as the Wasserstein metric or the Ljapunov exponent become popular. This study investigates how the Shannon entropy can lead to useful insights in the quality of turbulent-flow simulations. The Shannon entropy is calculated based on distributions, which enables the direct evaluation from unsteady flow simulations or by post-processing. A turbulent channel flow and a planar turbulent jet are used as validation tests. The Shannon entropy is calculated for turbulent velocity- and scalar-fields and correlations with physical quantities, such as turbulent kinetic energy and passive scalars, are investigated. It is shown that the spatial structure of the Shannon entropy can be related to flow phenomena. This is illustrated by the investigation of the entropy of the velocity fluctuations, passive scalars and turbulent kinetic energy. Grid studies reveal the Shannon entropy as a converging measure. It is demonstrated, that classical turbulent-kinetic-energy-based quality measures struggle with the identification of insufficient resolution, while the Shannon entropy has demonstrated potential to form a solid basis for LES quality assessment.


Introduction
Turbulence theory has evolved and many statistical measures are now available for describing turbulence through statistical analysis tools. Much of our knowledge on turbulence results from direct numerical simulation (DNS) experiments, where the largest available computers generate detailed databases at increasing turbulent Reynolds numbers. Considering these vast data-bases, one may ask about the properties of information in these simulation results, how these properties can be quantified, and what these properties can teach us about turbulence itself. In this context, the concept of information entropy is considered, which has been introduced by Shannon (2001) for a discrete random variable. The present paper examines, how this concept can be applied to turbulent flows to obtain insight into simulation accuracy.
At this point, large-eddy simulation (LES) can be considered as the state-of-theart technique for simulations of turbulent flows in engineering systems. However, the assessment of simulation accuracy and quality of LES is in its infancy when it comes to quantifying uncertainty and estimating a simulation's accuracy or quality-in particular from the simulation itself. Several quality estimators have been proposed, often based on energy-related criteria. Given that it is very hard to find a truely sufficient and general quality estimator for LES, it is worth exploring other options that may contribute to a more reliable error estimator. In this context, the concept of information entropy is tested and it is investigated what can be learned from it about LES-quality.
A large number of statistical quantities can be considered to describe a turbulent flow-including spectral information, spatial correlations, length-scales, structure functions and others-which often require joint measurements or sampling at two different points in space. An overview of commonly used statistical quantities is given in Pope (2001) or Wilcox et al. (1998). Several of these quantities are evaluated in the context of resolution quality by Davidson (2009).
Easier to use and to consider are one-point statistics, for which the velocity probability density function (PDF) provides the complete picture of the turbulence, but only its first two moments are usually used to quantify turbulence. In LES, the related filtered density function (FDF) provides the distribution of the velocities within a filter volume. The first moments of the PDF are the mean velocities and the Reynolds stresses u ′ i u ′ j in Reynolds-Averaged Navier-Stokes (RANS) simulations, the second moment of the FDF (in LES) are the subgrid stresses u i u j − u i u j in LES. The Reynolds and subgrid stresses emerge as unclosed terms from either averaging or filtering the Navier-Stokes equations and need to be modeled. These stresses can be contracted into a (subgrid) turbulent kinetic energy 1 2 u ′ i u ′ i . Higher moments can be derived and may appear in some modeling approaches, but we aim for a different description of the PDF.
To quantify the probability of occurence of the joint occurence of two quantities, two-point correlations and auto-correlations can be calculated and integral length scales can be derived. Two-point correlations have been also used as an approximate measure for the resolution of an LES Davidson (2009).
A Fourier-transformation can also be used to describe a quantity in spectral space. Thus, a function of length is transformed into a function of wavenumber, leading to energy spectra, which are a common tool to assess how energy is distributed over the turbulent scales. To obtain an impression of the quality of a simulation, results are sometimes compared to Kolmogorov's −5∕3-rule of the inertial subrange Davidson (2009).
The need for quantitative methods to assess the quality of LES has lead to numerous studies Celik et al. (2005); Klein (2005); Geurts and Fröhlich (2002). With implicit (LES) filtering, the physical model for the subgrid scales, the choice of numerical algorithm, grid resolution and filter size determine the quality of an LES Nastac et al. (2017).
A well-known quality-criterion is attributed to Pope, which uses the ratio of resolved turbulent kinetic energy k res and the sum of both resolved and modeled turbulent kinetic energy k res + k sgs as an indicator for LES quality Pope (2004). It is suggested, that the ratio should achieve values of at least 80% for the LES to be considered as sufficiently resolved: The criterion will indicate high resolution quality if k sgs is small-even falsely when a low value of k sgs is caused by a small model constant or a dissipative numerical scheme Nguyen and Kempf (2017); Davidson (2009). The subgrid-activity parameter s proposed by Geurts and Fröhlich (2002) aims to quantify the total amount of modeling and thus the possible modeling error of an LES. The parameter is calculated from the turbulent dissipation rate ⟨ t ⟩ and the molecular dissipation rate ⟨ ⟩: A subgrid-activity parameter of s = 0 would describe a DNS while a parameter of s = 1 represents an LES with infinite Reynolds number Geurts and Fröhlich (2002).
The Index " LES IQ " tries to assess the quality of LES with implicit filtering Celik et al. (2005). The LES IQ has a similar structure to Pope's approach in utilizing a ratio of the resolved and total turbulent kinetic energy LES IQ = k res ∕k tot , but the subgrid scales are treated with Richardson-extrapolation. The total turbulent kinetic energy k tot = k res + k h p relies on the grid-size h and the order of the numerical scheme p. The coefficient k is obtained from the extrapolation, which requires simulations with the grid of interest and a further refined grid Celik and Karatekin (1997); Roache (1998).
An approach that characterizes the error contributions of numerics and modeling seperately was developed by Klein Klein (2005) and Freitag and Klein (2006). The strategy is to vary model parameters to control the contribution of the model manually, which yields a different numerical-and model-error contribution. Both errors are quantified by a Taylor-series expansion of the numerical and modeling error terms. This requires solutions of the investigated LES, an LES with modified model contribution and a coarsened-grid LES Klein (2008). The method holds for a range of applications, but needs to be further evaluated for high Reynolds-number flows and different numerical schemes Klein (2005); Freitag and Klein (2006).
Turbulence captured by LES exhibits unsteady behavior and sensitivity to the initial solution, which attracts attention to the treatment of LES as a dynamical system, and hence, putting the general aspect of information stored in the flow field into spotlight rather than classical physical quantities. The fact that LES aims to capture turbulent dynamics, might give reason to favor a dynamical measure rather than conventional measures which rather assess the resolved temporal and spatial scales directly. Many quality-measures in common use share the property of comparing a physical-value produced by the LES to an experimental or estimated "true" value, but dynamic measures tend to quantify the frequency (1) = k res k res + k sgs > 0.8 of occurance of a certain solution or the degree of freedom of the solution-state Ruelle (1979); Eckmann and Ruelle (1985); Wu et al. (2018).
One example for a dynamical-based measure is the Lyapunov-exponent, which assesses the seperation of the solution of an LES when the initial solution is perturbated. While the perturbated and unperturbated solution separate until non-linear saturation in a chaotic system, the exponent quantifies the average separation of the solutions. Hence, the Lyapunovexponent solely remains on the turbulent dynamics in an LES calculation Wu et al. (2018); Nastac et al. (2017).
Another example is the Wasserstein metric. The Wasserstein metric is based on the "distance" of two variables of different states in the dynamical system. In the case of random fluctuations the states may be scattered closeby and hence, the distance might fall short. A single state is not meaningful for this set of data and should be replaced by a distribution instead. The Wasserstein metric can be seen as a measure for "dissimilarity" between the data of two different states Johnson et al. (2017); Wu et al. (2018).
Entropy measures form a popular choice in the study of dynamical systems. The concept of Shannon's entropy was brought into the study of dynamical systems by Kolmogorov (1965). Although there are plenty of forms of entropy in the context of dynamical systems, they are all related together as they assess the complexity of a given set. This can be divided into the study of the properties unpredictability, incompressibility, asymmetry and delayed recurrenceHillman (1998); Wyner (1994).

Shannon Entropy
From classical statistical thermodynamic arguments, the measure for the number of microscopic configurations describing a thermodynamic system comes to mind. Thermodynamical entropy has been used to assess dissipation and heat fluxes of an incompressible turbulent shear flow Kock and Herwig (2004). The entropy production was used to investigate numerical stability Naterer and Camberos (2003); Merriam (1989);Dutt (1988). The question arises, if another form of entropy might lead to further insight into turbulence.
Information entropy was first introduced by Shannon in 1948 to express the term "information" mathematically Shannon (2001). In communication theory, any exchange of messages requires a source and a destination. The source contains a defined set of messages of the cardinality N, where each message from i = 1 to N has a probability p i of being sent. Each message has its own value of information I i , where the gain of information I means removal of uncertainty and less likely messages carry higher values of information.
The value of information of the message i depends on its probability p i . The distribution of messages i in the set A of a physical quanitity X is treated as the alphabet of all messages i. The set A can be the result of e.g. a series of measurements. In general, A can be described as a set of tuples of the form A = , with x i being the values of X, which occur during the measurement.
Based on these formalities, the value of information can be formulated as I i = − log 2 p(x i ) with p i = p(x i ) . By definition, the binary logarithm is chosen to obtain the unit bit.
The statistical average of the value of information I i for i = 1, … , N leads to the Shannon entropy H. It holds the name entropy because of its formal analogy to the formulation of the entropy in statistical thermodynamics pioneered by Ludwig Boltzmann. Although arbitrary quantities in fluid dynamics are traditionally denoted as , the symbol X is chosen 1 3 in this work to stay consistent with literature on information-theory. The Shannon entropy of a quantity X, given in a set A, is defined as Shannon entropy is a rigorous measure of uncertainty. Its calculation is solely based on the probability distribution of the observed quantity X. The Shannon entropy is set to zero for probabilies p i = 0 , to enforce that messages i, which are never sent, lead to no gain of information. As opposed to the standard deviation, the absolute values of x i do not have any influence on the Shannon entropy.
The information entropy differs from other statistical measures in being a rigorous measure. For the calculation of the entropy only the probability is statistically relevant, while the event itself does not matter. Therefore, the investigated property does not have to be of numerical nature. Information entropy is dependant of the shape of a distribution and invariant to permutation-as demonstrated in Fig. 1. Statements about multimodal probability distributions can be derived easily from information entropy, while other statistical measures commonly require high-order statistics. The information entropy can be applied in unsteady and transient flows. It is independant of specific models (such as eddy-viscosity models providing a turbulent viscosity) and geometry. The choice of sampling space is free, which also allows application of the information entropy in local parts of the flow. An important aspect of a quality-measure for LES is the additional (computational) effort to be invested into a study, as measures which require additional calculations may be less attractive based on the computational costs. Opposite to some other measures, the presented form of the information entropy is easy to calculate, can be evaluated on-the-fly using a probability mass function based on the present set of samples at neglible computational costs and could be obtained for experimental data even for quantitative comparisons.
To illustrate the link between the Shannon entropy and a physical quantity such as a velocity field, the evolution of a flow field in a mixing layer and its entropy is used as a tangible example. Figure 2 shows the developement of the flow field for the mixing of three uniform velocity profiles of different magnitude. With increasing distance x from the nozzle, the mixing layer evolves and the velocity-gradients decrease due to shear. The evolution of the flow field shows the formation of a continuous spectrum of velocity values with proceeding state of mixing from originally three discrete velocities. This leads to the occurence of more velocity values. Defining the feature velocity value as the baseline statistical event for the Shannon entropy, the probability of a velocity value to occur in a certain state of mixing can be assessed and the Shannon entropy can be calculated for each velocity profile. The profiles reveal, that a more continuous range of velocities appear with advanced state of mixing, which can be seen as a set of more distributed events, leading to an overall increase of Shannon entropy. In addition, it can be seen that changing the magnitude of the velocities, does not affect the outcome of the entropy, as it is a rigorous measure.
Bottom: probability distributions (PMF) at these locations derived from the velocity profiles. Note the increase of information entropy H with proceeding state of mixing and the independance of H from the initial velocity value u and u * 1 3

Demonstration Case: Lorenz-Attractor
The Lorenz-attractor Lorenz (1963) is a system of non-linear ODEs and arises from the simplified Navier-Stokes equations for a problem of thermal convection between two plates, and exhibits chaotic solutions and therefore is interesting for studying uncertainty The classic set of parameters is = 10 , = 28 and = 8∕3 , which result from the nondimensionalization of the NSE, where is the Prandtl-number and is the Rayleigh-number. The temporal derivatives Ẋ , Ẏ and Ż correspond to the velocity components u, v and w. Since the Lorenz-attractor is a simple chaotic system, it is a well suited demonstration case for the calculation-and evaluation-procedure for the information entropy. The Lorenz-system is solved using a Runge-Kutta method of fourth-order accuracy. A timespan of 300 seconds is chosen to generate the trajectory. The velocities tangential to the trajectory are calculated at every point along the curve and their empirical probabilities p(Ẋ) , p(Ẏ) and p(Ż) are computed. A probability distribution is generated for each velocity-component u, v and w for which the Shannon entropy shall be evaluated. Based on studies of Camesasca et al. (2006), Archambault (1999) and Perugini et al. (2015), the entropy is normalized with the binary logarithm of the number of bins. Thus, the influence of the chosen amount of bins can be neglected. Being a scalar quantity, information entropy is calculated for each velocity component u, v and w separately.
The Lorenz-attractor is used as a well-defined reference case for entropy studies in turbulent flows. Figure 3 shows the velocity components along the trajectory with their respective probability distributions to give an introducing impression of what probability distributions are used for the calculation of the information entropy and how typical fields of the given case look like. The value of each velocity component along the trajectory is given by the color at the respective point, and the distribution of the respective velocity component is shown on the right-hand side. The bar-plot visualizes the Shannon entropy calculated from the probability distributions and the contribution to the kinetic energy of each velocity component.
The probability distributions of the Ẋ and Ẏ components are symmetric. The distribution of Ż is skewed, which results from the fast downward motion on the trajectory in the center of the attractor and the slow upward motion on the periphery of the discs. The main propagating direction along the attractor is the Z direction, which leads to the high contribution of Ż to the total kinetic energy, while the contribution of Ẏ to the kinetic energy is low. The attractor features only small motions in Y direction as the attractor can be seen as two merged discs, which hardly expand in Y direction. The X direction shows a medium contribution to the kinetic energy. Velocities are larger on the outer rings of the attractor and slower on the inner rings. The change of velocity on the outer turning points at the ends in X direction show the same change of velocity while the direction is changed. Therefore, the distribution appears to be very symmetrical.
The Shannon entropy is calculated based on the shown distributions for each velocity component, leading to H n (Ẋ) , H n (Ẏ) and H n (Ż) . The Shannon entropy of Ẋ appears to Turbulence and Combustion (2022) 108:353-385 1 3 be the lowest, as its probability distribution is relatively narrow compared to the other variables. This leads to a smaller range of statistical events, which indicates lower Shannon entropy. Entropy values H n (Ẏ) and H n (Ż) are higher than H n (Ẋ) , as the probability distributions cover a wider range of statistical events.
The Lorenz-Attractor has been an object of interest for further studies of dynamical measures such as the Lyapunov exponent or the Wasserstein metric, due to its dynamics being representative to these observed in LES and DNS. The Lyapunov exponent is a dynamical measure, which can be used to characterize the dynamical processes in a turbulent system and can lead to statements about the predictability time of a turbulent system. A further analysis can be found in Nastac et al. (2017); Wu et al. (2018).

LES and DNS of a Channel Flow and Plane Jet
The DNS and LES rely on the in-house code PsiPhi developed and used at the University of Duisburg-Essen and at Imperial College London. The code is based on a finite-volume technique on equidistant, isotropic Cartesian grids, relying on fourth-order central differencing for transport of momentum, total variation-diminishing (TVD-CHARM) schemes for passive and reactive scalars, pressure correction through a projection method, and a nominally third-order low-storage Runge-Kutta scheme Man and Moin (1991) for timeintegration at a CFL number of 0.7. Parallelization is achieved through domain decomposition and MPI. The code has been used in many LES studies Stein et al. (2011);Pettit et al. (2011);Rieth et al. (2017), some of them highly resolved, achieving DNS-resolution away from the burner,  and DNS of pulverised coal flame ignition Rieth et al. (2018). Turbulent initialization and inlet data are obtained by an efficient implementation Kempf et al. (2005) of Klein's well-known inflow data generator Klein et al. (2003). For the LES, Nicoud's Sigma model Nicoud et al. (2011) and the classic Smagorinsky model Smagorinsky (1963) were tested-the latter for making the finding's more easy to interpret and transferable to other simulations Rieth et al. (2014).
The channel flow simulations of Re = 395 and Re = 934 have been performed with the grids listed in Tables 1 and 2. The DNS of Moser et al. (1999) and Hoyas and Jiménez (2006) use non-uniform grid spacing in wallnormal direction. In this LES, cubic cells are used, leading to better resolution in streamwise and spanwise direction and hence, providing more samples for the generation of the probability distributions. Periodic boundary conditions are used in streamwise and spanwise direction, while an immersed boundary approach allows for the representation of walls in normalwise direction Peskin (1972).
A plane turbulent jet at Re = 10, 000 based on the DNS by Klein et al. (2003) serves as a further testing case. The computational domain extends 20D × 20D × 6.4D in streamwise, normalwise and spanwise dimension, with D being the nozzle width. Periodical boundary conditions are applied in spanwise dimension and the boundaries in normalwise dimension are treated with velocity data of the DNS by Klein. A filter by Anderson and Domaradzki (2012) is applied near the outlet boundary in streamwise direction to ensure unperturbated outflow. For further description of the case design, the authors refer to a previous study Engelmann et al. (2021). The used grids are summarized in (Table 3).

Calculation of the Information Entropy
The Shannon entropy is based on a discrete probability distribution, which requires a probability mass function (PMF-probability distribution with finite bins). To calculate a probability distribution, a field of any physical quantity is sufficient. The PMF is generated on a single plane using histograms of the desired quantity, which decomposes the continuous range of real values into a number of N discrete bins. The binning requires a width to separate the continuous range of physical values into discrete intervals. However, choosing any bin width appears to be arbitrary and violates the idea of finding an independent measure. Hence, the information entropy is normalized with the binary logarithm of the number of bins NCamesasca et al. (2006); Perugini et al. (2015), so that the possible entropy values range between 0 and 1 using Another aspect is the choice of mathematical borders, between which the binning is performed. While physical quantities can range between (−∞, +∞) , a given field of a physical quantity will be limited by the highest and lowest value X max and X min occuring in the field. Applying histograms for values in the intervals −∞, X min and X max , +∞ will lead to empty bins and hence, probabilities of zero, which-by definition-do not contribute to the value of entropy. Therefore, bins will only be applied within X min , X max . Although different fields will feature different values of X min and X max , the entropy of a set of fields X The coordinates x, y and z correspond to the streamwise-, wallnormal and spanwise-directions. N refers to the total number of cells remains comparable as values outside of this interval cannot influence the entropy since the values inside of this interval are converged with the number of bins. An algorithm is used to calculate the normalized entropy for gradually refined bin widths. The converged value of the entropy is then chosen to represent the distribution. A short summary of the necessary steps can be found in Table 4.

A-Priori Study of the JHU Channel Flow at Re = 1000
The skill of models and indicators in LES is typically measured by their performance in a-posteriori calculations. However, to investigate the convergence of information-entropybased measures with respect to the resolution, a-priori studies are a suitable method. Hence, the DNS of the turbulent channel flow at Re = 1000 by Graham et al. Graham et al. (2016) is used as a basis for the a-priori investigation. The data is openly accessible and provided by the John-Hopkins University (JHU) Turbulence Database. It has to be noted, that a-priori cases might give limited insight into the true nature of a model in a-posteriori applications, a prominent example affected by this being scale-similarity-type subgrid models for LES Klein et al. (2020).
The friction Reynolds number for the database is Re = 1000 and the velocity-based Reynolds number is approximately Re = 40, 000 . The domain consists of approximately 1.6 × 10 9 cells with a non-uniform grid spacing of Δ x ≈ 13 and Δ z = 7 viscous wall units, while Δ y follows a hyperbolic tangent profile. Hence, filtering was applied following the classical channel flow literature, e.g. Clark et al. (1979) and Piomelli et al. (1988) applying a Gaussian filter kernel in streamwise and spanwise directions.
The DNS data is compared with a-priori evaluations using filter-widths of Δ = 1, 2, 5 and 10Δ z . Figure 4 shows an instantaneous slice parallel to the wall in the channel center of the original calculation (top-left) and the a-priori filtered fields (indicated by the annotated filter-width Δ ). The images reveal an increased blurring-effect with progressive increasing of the filter-width. The removal of velocity fluctuations can also be observed with the decrease of the total magnitude of the fluctuations in the field as shown in the colorbars and in the probability distibutions. Further, the probabilites of the fluctuation values close to zero decrease, which is assumed to be a consequence of the filtering. In this context, it has to be noted that the application of a filter on a strong Table 4 Instruction for calculation of the normalized Shannon entropy H n (X) for a quantity of interest X 1. Perform the simulation of a chosen resolution up to temporally fully converged state 2. Choose a quantity X for calculation of the entropy H n (X) 3. Choose a physically meaningful field of X 4. Apply histograms within the range X min , X max with n b bins on the chosen field and divide by the total number of values in the field to obtain PMF 5. Calculate information entropy H n (X) for the PMF using Eq. (5) 6. Increase n b and repeat steps 4 to 6 until ΔH n (X) vanishes 7. Sample H n (X) over multiple timesteps 1 3 velocity fluctuation in a velocity field can be compared to the filtering of a pulsed tophat type signal with a low on/off-time ratio. Sampling a top-hat type signal is a representative case for generating a bi-modal distribution, as the signal is characterized by the alternating low and high values only. Applying a filter on the top-hat-signal leads to smoothening of the flanks, which is achieved by decreasing and increasing the values at the flanks respectively, thus increasing the total amount of values characterizing the signal and therefore introducing more modes into the signal-distribution. The generation of more modes in a distribution leads to a decrease of the probabilities of the already present modes and hence will increase the entropy. This effect is enforced with increasing the filter-width, however will be counteracted by the progressive decrease of the signal amplitudes, which eventually will reduce the entropy due to the removal of modes from the distribution. Therefore, filters are able to both increase and decrease the entropy of a signal, depending if either the generation of mode outweighs their removal or vice versa.
Results of the a-priori analysis can be found in Fig. 5. The entropies of the wall-parallel velocity-component fluctuations H n (u � ) and H n (w � ) were chosen as an generic example. The y + axis was cropped to allow for a better visualization of the near-wall and centerline regions and comparison with the following a-posteriori calculations, while still showing the overall entropy-behavior when approaching the wall further. Both profiles reveal progressive deviation from the unfiltered DNS data with increasing filtering width. The entropy of the streamwise velocity fluctuations reveal a local maximum at y + ≈ 10 , which is approximately where the production term of the turbulent kinetic energy reaches its maximum Buschmann and Gad-el Hak (2006); Moser et al. (1999). This region is governed by eddies with high turnover-velocities compared to the rest of the flow, which by tendency lead to greater velocity-fluctuation magnitudes and therefore extend the interval of possible fluctuation values (-the absolute values in this interval span from zero to the highest value of velocity fluctuation) Lozano-Durán et al. (2020). The mean flow occurs in streamwise direction and statistically no mean flow in spanwise direction occurs. This implicates that only little differences in the spanwise momentum occur compared to the streamwise direction, leading to absolute spanwise velocity fluctuations being smaller than absolute streamwise velocity fluctuations. Hence, the turbulent kinetic energy is mostly driven by the streamwise velocity fluctuations, which is in accordance with the findings of Moser et al. (1999) or Rieth et al. (2014). Therefore, the peak is more pronounced for the entropy of the streamwise velocity fluctuations. The entropy begins to decrease close to the wall due to the vanishing of the turbulent kinetic energy. The entropy shows an overall increasing trend with the increasing filtering width. This is assumed to be a consequence of the overall introduction of modes by the filter, as discussed before.

A-Posteriori LES of the Channel Flow at Re = 395
An LES is performed for six consecutively coarsened grids using Nicoud's Sigma model Nicoud et al. (2011), the resulting mean velocity and turbulent kinetic energy profiles are A good agreement with the DNS data can be achieved on the 0.5, 1 and 2 mm grids, while grids with 4 mm size and coarser show progressively increasing deviations from the reference data. Still, these grids were chosen intentionally to assess if the Shannon entropy can indicate major differences in the quality of the results.
On the left-hand side of Fig. 7, the instantaneous streamwise velocity fluctuation u ′ can be seen on a wall-parallel slice at the channel half-height . The right-hand side shows the probability distributions of u ′ obtained by applying the binning algorithm from Table 4 on the respective slices. The small line-plot features the value of the Shannon entropy for the shown distributions. The flow fields feature stronger blurring with increased gridspacing. The corresponding probability distributions feature stronger skew and extend further towards negative velocity fluctuations on the fine grid. This extend may be assumed to be a consequence of the field being located in the channel center. The turbulent mean velocity profile predicts the maximum velocity at this Fig. 5 Profiles of a-priori calculated entropies H n (u �� ) and H n (w �� ) with the wall distance y + in viscous units. The y + axis is cropped to allow for good visualization of the channel center and near-wall region Fig. 6 Profiles of viscous normalized streamwise velocity u + and the resolved turbulent kinetic energy k res with the wall distance y + in viscous units at Re = 395 . The underresolution (coarse grids) is intended to obtain simulations of different quality 1 3 location, which-following Prandtl's mixing-length hypothesis-leads to momentum exchange with regions closer to the wall due to turbulent transport statistically favoring momentum deficits in the center of the channel. The growth of the left flat part of the distribution in direction of negative values is assumed to be a consequence of the Fig. 7 Instantaneous streamwise velocity fluctuation u ′ normalized with the friction velocity u and the probability distributions based on the shown field of instantaneous streamwise velocity fluctuations at the channel half-height layer ( x − y plane) for the grid widths 0.5, 1 and 2mm at Re = 395 . The additional set of axis shows the entropy H n (u � ) of the three given fields arranged by grid size increased grid resolution, allowing for more and smaller turbulent structures in the flow field. The distributions appear more symmetric on the coarser grids.
The Shannon entropy of the resolved fluctuating velocity components u ′ , v ′ and w ′ is calculated and the information entropy values of the streamwise component H(u � ) , the wallnormal component H(v � ) and the spanwise component H(w � ) are obtained. Sufficient resolution of the near-wall stresses is crucial to achieve a good agreement for channel flow LES and DNS. The information entropy is calculated in dependence of the wall distance utilizing wall parallel slices of the domain, to assess if statements about resolution quality can be derived from this quantity.
The results are shown in Fig. 8 for six resolution levels. Particularly coarse grids are used to examine if the entropy is able to show poor resolution through characteristic features. The graphs of H n (v � ) and H n (w � ) feature a similar shape, since the entropy first rises with the wall distance and peaks after around 100 non-dimensional wall units. The profile of H n (u � ) features a maximum at 10 viscous wall units for the fine resolved simulations, the entropy values decrease approaching the centerline further. The shape of the curve becomes less distinct the coarser the grid becomes. H n (v � ) and H n (w � ) show a similar behavior for larger y + , but stronger sensitivity in the near-wall region. This similarity may be assumed to be a consequence of the similar behavior of statistics in these periodical directions. The resolution quality of the near-wall region is vital for the quality of the simulation, which might make the information entropy suitable for assessing simulation quality.
In boundary-layer theory it is postulated that the flow is laminar close to the wall Schlichting and Gersten (2016). Hence, the velocity profile is not affected by the turbulent fluctuations, which transform a single value velocity into a distribution. In the fully evolved case, the flow can be seen as orderly. For wall distances within the laminar sublayer, there is no uncertainty of which velocity value is found as only one velocity value can be found. Thus H n (u � ) = 0 for a plane parallel to the wall with a fixed wall distance in the laminar sublayer. Increasing the wall distance and leaving the laminar sublayer, small turbulent structures arise, starting to superimpose the velocity profile Pope (2001). The mean velocity is then superimposed by weak fluctuations induced by eddies of small size. For sufficient resolution of the near wall region, the fluctuations lead to a greater number of velocity values besides the mean value and widen the distribution. This leads to a rising uncertainty and therefore an increase of the Shannon entropy H n (u � ) > 0.
The graph on the middle-right side of Fig. 8 shows an evaluation of the turbulent kinetic energy ratio, where the subgrid turbulent kinetic energy was calculated following ideas of Yoshizawa (1982), Lilly (1967) and Vreman et al. (1994) based on the turbulent viscosity obtained from the Sigma model. The ratio is shown versus y + to investigate the relation between resolution and wall distance. Following the-commonly used but often misleading-recommendation of > 0.8 for sufficient LES quality, all grids would lead to satisfactory results except from the 4 mm grid at a wall distance of around 100 wall units. Furthermore, it may be concluded, that sampling of in wallnormal direction would have lead to an overall ratio > 0.8 for all simulations, which would imply sufficient resolution of even the coarse grids, which has been proven wrong during the evaluation of Fig. 6. Even though the ratio indicates the right trend of resolution and result quality for the fine 0.5, 1 and 2 mm grids, the criterion fails to warn against actual poor resolution-a key property for a useful quality indicator.
The LES IQ is shown in the down left graph. A value greater than 0.8 is suggested for an LES to be considered properly resolved, while a value of 0.952 resembles a fully resolved DNS. The finely resolved simulation of 0.5 mm grid size achieves the highest values, almost reaching a constant value of 0.952, which would imply DNS-like resolution. This finding however cannot be fully confirmed by the velocity and turbulent kinetic energy profile. Since the very poorly resolved grids show lower values above 0.8, it might be concluded that the LES IQ struggles with strong underresolution. However, compared to the energy ratio, the LES IQ shows some form of convergence for the poorly resolved grids. The down right graph shows the subgrid-activity parameter s. The parameter s shows stronger (1), the LES IQ by Celik et al. (2005) and the subgrid-activity parameter s suggested by Geurts and Fröhlich (2002) with the wall distance y + in viscous units at Re = 395 1 3 sensitivity towards the subgrid model contribution for the fine grids. The overall behavior of the two indicators appears very similar with respect to the curvature of the graphs, with the subgrid-activity parameter being stretched over a larger range of values. Hence it may be considered, that similar insights can be obtained from both measures.
To further investigate the connection between the Shannon entropy and the turbulent kinetic energy, the entropy of the absolute velocity H n (| ⃗ u � |) is plotted over the resolved turbulent kinetic energy k res in Fig. 9. Both quantities are obtained from the wall-parallel cross-sections for each distance from the wall, up to the channel half height. The behavior of both quantities is linked as the entropy grows with increasing turbulent kinetic energy. The resolved turbulent kinetic energy k res = 1 2 u � i u � i is dominated by the streamwise velocity fluctuations Rieth et al. (2014). It has been discussed before, that uncertainty reaches its maximum for the strongest turbulent kinetic energy. Therefore, the entropy H n (| ⃗ u � |) and the energy k res share a similar contour while plotted versus the wall distance and must show some correlation.
While a correlation between the entropy H n (| ⃗ u � |) and the turbulent kinetic energy k res has been postulated, the location of both the maximum entropy and energy occur at around 10 viscous wall units on the fine grids. Both measures rely on the rms of the velocity fluctuations in streamwise, wallnormal and spanwise direction. As the turbulent kinetic energy in a channel flow is dominated by the streamwise velocity fluctuation, the location of the maximum of k res is dictated by u ′ Davidson (2009). The maximum of H n (| ⃗ u � |) however only depends on the range of different fluctuation values that can be observed at a given wall distance. H n (| ⃗ u � |) will reach large values, if a wide set of different fluctuation values | ⃗ u ′ | can be observed. Thus, the entropy increases, if u ′ , v ′ and w ′ contribute different magnitudes. Therefore, the information entropy H n (| ⃗ u � |) reaches a minimum value in the isotropic case. The velocity fluctuations u ′ , v ′ and w ′ tend to achieve values of similar magnitude closer to the center line and can be considered more isotropic than closer to the wall Pope (2001); Kim et al. (1987). This also shows agreement with the decrease of H n (| ⃗ u � |) for wall distances y + > 100 . At the wall, the velocity fluctuations approach zero due to the no-slip condition. Increasing the wall distance, the velocity fluctuations u ′ , v ′ and w ′ start to Fig. 9 Profile of the entropy of the absolute velocity fluctuations H n (| ⃗ u � |) with the wall distance y + (left) and dependency between the entropy of the absolute of velocity fluctuations H n (| ⃗ u � |) and the resolved turbulent kinetic energy k res (right) at Re = 395-the marker size (from large to small) indicates the proximity to the wall in viscous wall units for a discrete chosen set of points with y + = 30, 100, 200 1 3 grow with different rates and differ more. This also leads to a wider range of events which matches with the rising slope of H n (| ⃗ u � |). The link between the Shannon entropy and the turbulent kinetic energy is visualized in Fig. 9. The entropy of the magnitude of the vector of velocity fluctuations is plotted as a function of the turbulent kinetic energy-the wall distance acts as variable along the profile. Small distances to the wall are indicated with larger dots. The more fine grids reveal a proportional increase between the entropy and turbulent kinetic energy, as can be expected due to the similar behavior in the near wall region, leading to a straight line behaving similar to an angle bisector. The curve then proceeds towards lower energies with comparably higher entropies. Hence, the fine grids tend to reveal a higher value of entropy for similar values of turbulent kinetic energy.

A-Posteriori LES of the Channel Flow at Re = 934
To substantiate the findings from the previous Sect. 4.2, another set of simulations has been carried out at higher Reynolds number Re = 934 using the same setup. A refined grid of 0.25 mm width has been added to take into account for the decreased boundary layer thickness, achieving a resolution of 3.9 viscous wall units, while the coarsest grid of 8 mm is neglected in this case. Figure 10 shows the viscous normalized velocity profile and the turbulent kinetic energy with the wall distance in viscous wall units featuring the reference data of Hoyas and Jiménez (2006). The grids of up to 2 mm width show good agreement with the DNS data, while the coarser grids lead to a stronger overprediction of the velocity profile. All grids manage to predict the resolved turbulent kinetic energy well, with the wall-nearest point on the coarser grids showing stronger deviations. Again, the results from (too) coarse grids are kept to also show the effect of poor resolution.
In Fig. 11 visualizations of the flow field of the instantaneous streamwise velocity fluctuations u ′ along with the corresponding probability distribution in the channel center are shown for the three finest grids in the same manner as in Fig. 7. The flowfields appear more blurred for the coarser grids, however the overall effect is not as distinctive as for the lower Reynolds number. The obtained probability distribution shows a stronger skew Fig. 10 Profiles of viscous normalized streamwise velocity u + and the resolved turbulent kinetic energy k res with the wall distance y + in viscous units at Re = 934 . The underresolution (coarse grids) is intended to obtain simulations of different quality 1 3 towards negative fluctuations on the finely resolved 0.25 mm grid opposed to the coarser grid, where the distribution appears rather symmetrical. Figure 12 shows the profiles of the information entropy of H n (u � ) , H n (v � ) and H n (w � ) as well as the ratio of the resolved and modeled turbulent kinetic energy k res ∕(k res + k sgs ) , the LES IQ and the subgrid-activity parameter s. The entropy of the streamwise velocity fluctuations H n (u � ) features a similar shape compared to the lower Reynolds number with one maximum at y + ≈ 10 and a local maximum closer to the channel center. The entropies of the normalwise and spanwise velocity fluctuations H n (v � ) and H n (w � ) show an increase from lower to higher wall distances. Again the entropy of the normalwise fluctuation reveals a strong drop closer to the wall, which is a consequence of the conservation of mass Fig. 12 Entropy results H n (u � ) , H n (v � ) , H n (w � ) and the ratio of resolved and total kinetic energy = k res ∕(k res + k sgs ) based on Eq. (1), the LES IQ by Celik et al. (2005) and the subgrid-activity parameter s suggested by Geurts and Fröhlich (2002) with the wall distance y + in viscous units at Re = 934 1 3 prohibiting wallnormal velocity components close to the wall and hence, reducing the normalwise velocity fluctuations. The shape of the entropy curves become less distinct with reduced grid resolution, however especially the H n (u � ) and H n (v � ) profiles feature a clear trend of curve-development when increasing the grid resolution.
The ratio of the resolved and the modeled turbulent kinetic energy can be found in the center-right plot. The finely resolved grids reveal very low contributions of the modeled turbulent kinetic energy. The ratio of resolved turbulent kinetic energy decreases progressively with coarsening the grid. Following the recommendation of the resolved turbulent kinetic energy making up at least 0.8 of the total kinetic energy, the coarsest grids can only be considered well resolved near the centerline or close to the wall. As revealed by the velocity profile in Fig. 10, the resolution of both grids can be considered overall poor for all wall distances. The LES IQ reveals a similar trend, with the finest grid achieving very high values close to 0.952 which can be considered DNS-resolution. Celik et al. Celik et al. (2005) suggest a value of at least 0.8 for a LES to be well resolved and all calculations achieve values of at least 0.86 for all wall distances. The subgrid-activity parameter can be found in the bottom-right plot. It features a similar trend to the ratio of kinetic energy and LES IQ with the finest grid revealing the lowest subgrid-activity. While the subgrid indicator is often used as a measure for assessing the contribution of the subgrid model, especially the criteria based on the turbulent kinetic energy ratio and the LES IQ do not fully represent the findings from the velocity profile in Fig. 10 as the agreement with the DNS results does not necessarily improve with grid refinement.
To assess the link between the turbulent kinetic energy and the information entropy, the entropy of the norm of the fluctuation vector and the turbulent kinetic energy can be found in Fig. 13. The curve of the entropy H n (| ⃗ u � |) features a maximum at y + ≈ 10 and an overall decrease for larger distances from the wall, similar to the findings for the lower Reynolds number. Plotting the entropy versus the turbulent kinetic energy again reveals a bisectorlike behavior for the lower wall distances on the finer grids. After the peak values of energy and entropy, the proportional behavior changes and leads to a weaker increase of entropy with increasing kinetic energy as already observed in Fig. 13.   Fig. 13 Profile of the entropy of the absolute velocity fluctuations H n (| ⃗ u � |) with the wall distance y + (left) and dependency between the entropy of the absolute of velocity fluctuations H n (| ⃗ u � |) , the resolved turbulent kinetic energy k res (right) at Re = 934-the marker size (from large to small) indicates the proximity to the wall in viscous wall units for a discrete chosen set of points with y + = 50, 100, 200 All entropy profiles shown in Figs. 8/12 and 9/13 reveal dependence on the grid size and-more importantly-show a trend with decreasing (e.g. poorer) resolution: while the single entropies of u ′ , v ′ and w ′ served as an introduction example, it may be argued, that the entropy of the absolute fluctuation value H n (| ⃗ u � |) is a more suitable representative for generating statements about the simulation. Indeed the simulations, which can be considered sufficiently resolved for the channel-flow, feature similar behavior in the peak-region and show the predicted decrease of entropy for closer wall distances for H n (| ⃗ u � |) . While measures based on the turbulent kinetic energy are often criticized for generating misleading statements about the simulation quality Davidson (2009);Nguyen and Kempf (2017)as emphasized in Figs. 6/10-an entropy-measure based on the total fluctuation value may be considered, as it obtains its information about the flow physics rather from the statistical structure of the flow field than from absolute values. Figure 14 gives a visual impression of the jet for different mesh qualities and shows the distributions used for the calculation of entropy, instantaneous fields of the passive scalar and the corresponding propability distributions, along with the information entropy calculated from the distributions. The finest grid with 20 cells per nozzle height D reveals the finest structures in the flow field, which gradually disappear while coarsening the mesh to ten and five points-again, a coarse grid has been used intentionally for testing the behavior of information entropy. The LES resolution is chosen to allow for a sufficient scale separation between LES and DNS and at the same time, provide a sufficient resolution of the shear layer. Figure 15 shows the streamwise velocity fluctuations √ u ′ u ′ and passive scalar fluctuations √ ′ ′ over the normalwise distance y/D from the centerline. Profiles of mean quantities ⟨u⟩ and ⟨ ⟩ are commonly considered as easy to predict, so that the focus lies on the fluctuations that provide a more challenging test for the simulation. The agreement with the DNS data gets worse with coarser grids, characterized by high fluctuation values for larger distances from the centerline. While there is little difference between the 0.5 and 1 mm grids, deviations from the DNS data are significantly stronger for the 2 mm grid. Given the fact that simulation results for the two finer grids show a similar match with DNS data and doubling the gridsize leads to a notable drop in quality, the 2 mm grid may be considered insufficiently resolved.

A-Posteriori LES of the Plane Turbulent Jet
The information entropy has been evaluated for the streamwise velocity fluctuations u ′ , the absolute value of the velocity fluctuation vector | ⃗ u ′ | and the passive scalar fluctuations ′ . The results are shown in Fig. 16 and the turbulent-kinetic-energy-based criterion [Eq. (1)] has been added. Due to access to the complete DNS database, results of the Shannon entropy for DNS resolution have been added.
The overall impression from the results reveals the entropy as an actual converging measure, since differences between DNS and LES results decrease with grid refinement. Strong differences for the velocity-fluctuation-based entropies H n (u � ) and H n (| ⃗ u � |) can be observed between the fine and coarse grids. The velocity-fluctuationbased entropies reveal high values near the centerline, representing the strong shear also found in the previous fluctuation profiles, which goes in hand with higher overall fluctuation values and hence, more possible velocity values, leading to a wider range of events and an increase in information entropy. With decreasing shear, the entropy also reduces, but increases again at larger distances y/D, which are governed 1 3 by large-scale vortices that lead to an increase of entropy. The decrease of entropy near y∕D = 10 is considered to be the result of the boundary conditions and hence, is not discussed any further. The entropy profile H n ( � ) shows a decrease with increasing centerline-distance. This is assumed to be a result of vortices in the region of strong The profile of k res ∕(k res + k sgs ) shows strong differences between all three mesh resolutions. While quantitative deviations between the 0.5 and 1 mm grid have been shown to be small in Fig. 15, the criterion states strong differences in resolution quality up to medium distances from the centerline. The gridsize of 2 mm achieves values of below the suggested limit of 0.8 for y∕D < 2 , but considers the simulation to be resolved sufficiently for larger distances from the centerline. Similar observations can be made for the subgrid-activity parameter s, which predicts overall lower subgrid contributions for a finer grid. A different trend can be observed for the LES IQ . The LES IQ claims overall better resolution with coarser grids, which contradicts the findings from Fig. 15-especially on the 2 mm grid. This indicator predicts more poor resolution for larger distances y/D, while the actual agreement between the LES and DNS data is improving according to Fig. 15.

Summary and Conclusion
The utility of the Shannon entropy has been studied for different flow variables in different canonical flows in LES. Calculations over a wide range of grid resolutions have been performed to provide a set of differently resolved simulations, for which the Shannon entropy Fig. 16 Entropy results H n (u � ) , H n (| ⃗ u � |) , H n ( � ) , the ratio of resolved and total kinetic energy = k res ∕(k res + k sgs ) based on Eq. (1), the LES IQ by Celik et al. (2005) and the subgrid-activity parameter s as suggested by Geurts and Fröhlich (2002) with the normalwise distance y/D from the centerline. H n has been calculated on fields in homogeneous and streamwise direction for each y/D 1 3 has been evaluated. The Shannon entropy is a simple measure, which can be calculated during a simulation or post-processed, without the need of performing further simulations with modified parameters or grid sizes. The Shannon entropy obtains its information about the flow physics rather from the statistical structure of the PMF and the underlying flow field than from its absolute values, which provides a different view-point.
General observations about the Shannon entropy are that (a) it drops off approaching the wall and towards the center of a channel, with the finest simulations achieving the lowest information entropy values at the wall, and that (b) in a free shear layer information entropy seems to drop with grid resolution, so that overall, achieving a low Shannon entropy may imply a good resolution. For the underlying PMF, this would require a multimodal probability distribution. This would imply that the probability distribution features characteristics such as a characteristic state of mixing, a flow field being characterized by vortices of a certain length-scale or a state of thermal equilibrium.
Low values of information entropy are achieved for distributions which are governed by only few modes with high probabilities. Looking at a flow field, such a field would correspond to locally identical values which can be considered as "steps" in spatial profilesthis is with strongly localized gradients and curvature: relatively large zones of constant value must be separated by thin layers of steep gradients and strong curvature to achieve a low entropy. Requiring locally either very strong or low gradients, it may be argued that also the gradient field must be composed of only few modes. An extreme example of such a low-entropy field would be a very thin shear layer, right at a splitter plate, where only a very thin interface separates a fast fluid A from a slow fluid B. After some mixing, the homogeneous zones of values of just A or B will be separated by a thicker mixing layergenerating more states of mixture and hence, a higher entropy. Thinking about LES-quality, such a field can only be maintained by very good numerical resolution-maintaining steps requires much better numerical resolution than just maintaining minima and maxima. In that sense, entropy could be seen as a more stringent quality criterion than resolved energy. In fact, low-entropy "steps" require that multiple neighbouring cells take almost the same value-for example, by sharing the same eddy. This implies that low entropy can only be achieved if eddies (of homogeneous states) are resolved by two or more pointsa sensible requirement for an LES! One therefore might argue that information entropy could indicate whether a grid can actually resolve eddy structures-which may be more than other criteria can provide.
Two test cases for a-posteriori LES were chosen to cover two common canonical flows established for the assessment of new models, with the turbulent channel flow representing wall-bounded flows and the plane turbulent jet representing free shear flows. The cases were designed in such a way, that no effects of different boundary conditions or additional modeling such as wall-modeling may be observed. In the case of the channel flow, wallmodeling might be required for coarse grids. It might be argued that this error is introduced not by the lack of an additional wall-model, but by the lack of sufficient resolution in the near wall region. Therefore, in this study the resulting error is treated as a consequence of underresolution, which in fact was intended to achieve on the coarse grids to demonstrate its effect on the information entropy. Both cases feature Reynolds numbers which are considered to be lower than in typical technical applications. Nevertheless, the turbulent channel flows at the present Reynolds numbers form a popular choice in the context of LES modeling assessment Piomelli et al. (1988); Klein et al. (2020); Hasslberger et al. (2021), as they allow for good resolution of the near wall region at moderate computational costs using Cartesian grids and therefore allow for an affordable and broad spectrum of grid qualities. Plane turbulent jets are a canonical case featuring the prominent 1 3 challenge of the choice of the right boundary treatment due to the presence of entrainment phenomena. While there are several cases available in literature Klein et al. (2003); Stanley et al. (2002), possible differences in boundary treatment may introduce additional undesired errors Engelmann et al. (2021). To avoid these errors, a simple DNS case has been designed and all calculations have been performed using identical numerical settings. While the Reynolds number in this case is moderate in comparison with technical applications, the case is considered relevant for canonical LES studies and the computation on a Cartesian grid remains affordable.
Considering the potential of Shannon entropy as a future LES quality indicator, we note that Shannon entropy profiles reveal a converging trend with resolution increasing (towards the limit of DNS), whereas the ratio of resolved and total turbulent kinetic energy struggled with assessing the poor simulations, yet we do want to mention that other insights might be obtained from vorticity-or enstrophy-based criteria, which were not in the focus of this study. This may imply that besides being an interesting quantity for analyzing and interpreting turbulent flows, Shannon entropy could help to assess the quality of LES. We would however like to point out that much further research will be needed for establishing a reliable quality criterion. At the same time, all statements about quality of resolution derived from the Shannon entropy in this early state have to be assessed with care.

Simplified Introduction to Information Entropy
The information entropy H can be seen as a measure quantifiying the predictability of a discrete statistical experiment, which is also referred to as the uncertainty of the experiment. Statistical experiments with only one outcome may be considered fully predictable or experiments with no uncertainty of the outcome. A statistical experiment may be seen as least predicatble or most uncertain if every outcome has the same probability.
For sake of simplicity, assume having a set of marbles only differing by color. A simple experiment can be conducted by filling a bag with five marbles. In case A (Fig. 17 left) the bag shall be filled with five marbles of the same color. While reaching for a marble from the bag, the outcome of the experiment is already known. As only blue marbles exist in the bag, the result blue marble is fully predictable and hence, may be seen as without uncertainty. Equation (5) can be used to assess the value of entropy for this experiment. The number of statistical events N equals 1, as there is only the event blue marble and the respective probability is p = 1 . Hence, H n = −1 ⋅ log 10 (1)∕ log 10 (1) = 0 . This result can be interpreted as zero uncertainty over the outcome of the experiment.
The opposite scenario can be achieved by filling the bag with five marbles of different color, which is referred to as case B (Fig. 18 right). The new setup leaves no tendency for a certain colored marble as all colors appear with the same probability. For this setup using five marbles, the predictability reaches a minimum and the uncertainty of the outcome reaches its maximum. Hence, the information entropy as a measure for this uncertainty reaches a maximum using Eq. (5), as H n = 5 ⋅ −0.2 ⋅ log 10 (0.2)∕ log 10 (5) = 1.
It can be observed from cases A and B, that the value of entropy increases with more equal distribution of the statistical events-which is represented by adding more marbles of different colors. A more equal distribution can be achieved by adding more statistical events to the experimental setup, as demonstrated in Fig. 18. To the original five blue marbles, a purple marble is be added, which reduces the predictablility in the experiment. Drawing from the bag now leaves a small uncertainty of the outcome, since a purple 1 3 marble might be the result of drawing. This uncertainty is represented by an increase of information entropy. Adding another marble of a different color to the bag decreases the probability of drawing a blue marble further and hence, increases information entropy. The addition of more marbles of different colors continuously reduces the original significance Fig. 17 Discrete statistical experiment: bag filled with five marbles. Case A: every marble has the same color-the outcome is fully predictable. Case B: every marble has a different color-the outcome is nonpredictable Fig. 18 Discrete statistical experiment: bag filled with five blue marbles. Case C: marbles of a new color are added incrementally to the bag and H n is calculated at each step for the bag filled with the current set of marbles 1 3 of the blue marbles by reducing the difference between the probabilities of drawing a blue marbel and drawing another marble.
The difference between the standard deviation and the information entropy in statistics might be guessed already by the nature of the example chosen to introduce the information entropy. The information entropy is also referred to as rigorous due to its ignorance towards the nature of the statistical event. In the context of information entropy it does not matter if the statistical event is the amount of prize money in a tombola, hitting a certain number at a dice roll or receiving a specific marble while drawing from a bag. The standard deviation is always bound to the mathematical value of a statistical event (a prize money, the number of eyes rolled with a dice, the age of a person picked for a social study) and scales with the orders of magnitude covered by all statistical events. Hence, it may be concluded that information entropy is only influenced by the spreading of the statistical events. The standard deviation depends on the spreading as well as on the mathematical value of the event.