Mathematical bounds on Shannon entropy given the abundance of the ith most abundant taxon

The measurement of diversity is a central component of studies in ecology and evolution, with broad uses spanning multiple biological scales. Studies of diversity conducted in population genetics and ecology make use of analogous concepts and even employ equivalent mathematical formulas. For the Shannon entropy statistic, recent developments in the mathematics of diversity in population genetics have produced mathematical constraints on the statistic in relation to the frequency of the most frequent allele. These results have characterized the ways in which standard measures depend on the highest-frequency class in a discrete probability distribution. Here, we extend mathematical constraints on the Shannon entropy in relation to entries in specific positions in a vector of species abundances, listed in decreasing order. We illustrate the new mathematical results using abundance data from examples involving coral reefs and sponge microbiomes. The new results update the understanding of the relationship of a standard measure to the abundance vectors from which it is calculated, potentially contributing to improved interpretation of numerical measurements of biodiversity. Supplementary Information The online version contains supplementary material available at 10.1007/s00285-023-01997-3.


Introduction
The quantitative measurement of features of biological diversity is central to ecology.Over decades of analysis, many statistics have been proposed as diversity measures, and their mathematical properties have been studied (Pielou 1975;Magurran 2004;Leinster 2021).
Among the enduring measures of diversity in ecology is the Shannon entropy, first borrowed in the 1950s from the formula's origins with Shannon's information theory (Shannon 1948), and variously known as the Shannon diversity, Shannon index, Shannon-Weaver index, or Shannon-Wiener index (Spellerberg and Fedor 2003;Rodríguez et al. 2016;Sherwin and Fornells 2019).For a frequency vector p = ( p 1 , p 2 , . . ., p n ), the Shannon entropy is where each p i is a non-negative quantity that, in biodiversity measurement, represents the relative abundance of species i in a community.The p i sum to 1 and h( p i ) = −p i log p i .We use the base-e logarithm and adopt the convention of defining −0 log 0 = 0 (Leinster 2021, pp. 39-40).Shannon entropy has a number of convenient mathematical properties as a diversity measure for the species in a community.Considering all possible frequency vectors, it reaches its minimum of 0 when the vector has only one non-zero entry with frequency 1.Its maximum of log n is reached when the distribution of probabilities across n categories is uniform; the upper bound therefore increases with the vector length n (Leinster 2021, pp. 41-42).In the language of biodiversity, Shannon entropy is large when a community contains many equally common species, and it is minimal when the community has only one species.The Shannon entropy can be linked to broader families of statistics, such as the Rényi entropies (Rényi 1961), for which it can be regarded as a limiting case, and the Hill numbers (Hill 1973;Jost 2006;Leinster and Cobbold 2012;Chao et al. 2014), for which its exponential e H is a special case.
With its long-standing role as a popular diversity statistic, Shannon entropy is ubiquitous in biodiversity studies (Pielou 1975;Magurran 2004;Sherwin and Fornells 2019;Cushman 2021).Hence, new mathematical results concerning its behavior have the potential to assist in understanding features of numerous ecological communities, both in ongoing studies and in previously reported analyses that have relied upon this index.
A general aspect of diversity measurement is that a diversity statistic computed from frequency vectors, each representing the relative abundances of species in a community, can reach similar values for quite different species relative abundances.Consider two communities with different values for the Shannon entropy.Is the difference driven by abundance differences in one or two dominant species, or by differences in many less common species?Consider also two communities that have similar Shannon entropy values and whose abundances are similar only for the few dominant species that have the strongest influence on the numerical value of the statistic.Is the similarity meaningful in light of abundance differences among the rarer species?
We seek to provide insight on such questions by exploring the mathematical constraints, or bounds, imposed on Shannon entropy by the ith-most abundant species.That is, if we fix the frequency of the ith-most abundant species in a community but leave other frequencies free to vary, what are the largest and smallest possible values of Shannon entropy?
Working with the case of i = 1 in a population genetics context mathematically identical to that used in ecological diversity computations, Aw and Rosenberg (2018) noted that if the frequency p 1 of the largest value in a frequency vector is fixed, then Shannon entropy is bounded above both by log n and by a tighter bound, a certain function of p 1 .Further, the value of p 1 produces a certain lower bound on Shannon entropy.Thus, with p 1 specified, Shannon entropy is constrained more tightly than the interval [0, log n].If the Shannon entropy is computed in a community that possesses a single dominant species, then the placement of the Shannon entropy with respect to this tighter interval conditional on the abundance of the dominant species is perhaps a more meaningful value than its placement with respect to [0, log n].To better inform comparisons of biodiversity measurement among communities, the bounds we provide on Shannon entropy in relation to the frequency of the ith-most abundant species clarify the dependence of Shannon entropy on the relative abundances of the various species-not only the most abundant one.

Bounds on Shannon entropy: the most abundant species
Similar values of Shannon entropy can be generated by quite different species composition vectors.For example, consider two communities, each with ten species in total.Community A has two moderately common species and eight rare species: one species at abundance 0.5, another at abundance 0.492, and eight rare species at abundance 0.001 each.Community B is dominated by a single species at abundance 0.85, with the remaining nine species each having abundance 1 60 .These communities both have Shannon entropy H ≈ 0.75 despite having quite different composition.One way to contextualize the Shannon entropies of these two communities is to look at them in light of the upper and lower bounds on Shannon entropy conditional on the abundance of the most abundant species and the total number of species.This approach takes into account differing most-abundant-species abundances, allowing a researcher to understand if the values of Shannon entropy are chiefly a byproduct of the abundance of a single dominant species.Aw and Rosenberg (2018) established the bounds on Shannon entropy as a function of the greatest abundance (Corollary 3.16).Without loss of generality, we re-order the species relative abundance vector p such that p 1 p 2 . . .p n .The distribution of abundance across the entries of the vector is constrained by two requirements: the entries must sum to 1, and p i p j if i < j.
Proposition 1 For a fixed value of the frequency p 1 of the most abundant species in a community with n species, the vector maximizing H is p * , where The upper bound on H is Proposition 2 For a fixed value of the frequency p 1 of the most abundant species in a community with n species, the vector minimizing H is p * * , where with the first 1/ p 1 − 1 entries equal to p 1 .The lower bound on H is In general, Shannon entropy is greatest when the distribution of species is as "even" as possible, reaching its maximum log n across all n-species abundance distributions if the n species each have abundance 1 n .In Proposition 1, if p 1 is fixed, then Shannon entropy is maximized when the remaining abundance, 1 − p 1 , is spread evenly across all n − 1 remaining species.
On the other hand, Shannon entropy is smallest when the distribution of species is as "uneven" as possible.Across all n-species abundance distributions, this minimum is obtained if a single species has abundance 1 and all other species have abundance 0. In Proposition 2, for fixed p 1 , Shannon entropy is minimized when the remaining abundance, 1 − p 1 , is distributed across as few species as possible.If p 1 1 2 , then this minimizing vector is simply ( p 1 , 1 − p 1 , 0, . . ., 0).If, instead, p 1 < 1 2 , then the condition p 1 p 2 . . .p n requires that none of the subsequent vector entries exceed p 1 .The largest abundance we can assign to any one species is p 1 , and we repeat this assignment as many times as possible before assigning all remaining abundance to (at most) one last species.
The upper and lower bounds on Shannon entropy are plotted as a function of p 1 for varying values of n, the number of species, in Fig. 1.Both bounds decrease with increasing p 1 ; the lower bound has a piecewise structure, reflecting the fact that fewer species have non-zero abundance as p 1 increases.Across panels with increasing n, the upper bound increases; the lower bound for a given p 1 remains the same, except that its domain grows with n.
To compare Communities A and B, we direct our attention to the vertical cross sections at p 1 = 0.5 and p 1 = 0.85 of Fig. 1G, which shows the bounds on Shannon entropy conditional on p 1 for communities with n = 10 species.The large space between the upper and lower bounds when p 1 = 0.5 suggests that the value of Shannon entropy for Community A is not tightly constrained by its most abundant species.On the other hand, Community B, with a single dominant species, falls towards the righthand side of Fig. 1G, with p 1 = 0.85.In general, large values of p 1 tightly constrain Shannon entropy; for each n 3 in Fig. 1, the space between the upper and lower bounds at p 1 = 0.5 is larger than at p 1 = 0.85.Thus, because Community B has As the number of entries, n, increases, this upper bound, log n, increases, so the range of the y-axis increases.The bounds are taken from Propositions 1 and 2. Note that panels H and I have y-axis scales that differ from those of the other panels one dominant species, its Shannon entropy is almost exclusively determined by the relative abundance of that species, and the abundances of rarer species could change without substantially influencing the Shannon entropy.This effect-that the Shannon entropy is constrained by a dominant species-is most pronounced with small species richness n.

Bounds on Shannon entropy: the ith-most abundant species
Now consider a community with multiple dominant species, rather than just one.How do the values of the second, third, or, in general, ith-most abundant species constrain Shannon entropy?
We now present our new bounds on Shannon entropy as a function of the ithgreatest species abundance.We saw in the i = 1 case that entropy is maximized when abundances are as evenly distributed across the entries of the vector as possible.We construct the maximizing vector for i 2 similarly, setting every species abundance before or after the ith equal to p i ; whether entries before or after the ith one equal p i Fig. 2 Upper and lower bounds on Shannon entropy from Theorems 3 and 4 as functions of the abundance of the second-most abundant species, p 2 , for fixed n.
Thus, the point on every plot with the highest Shannon entropy lies at ( 1 n , log n).Because p 2 is the second-most abundant species, it cannot exceed 1 2 .Note that panels H and I have y-axis scales that differ from those of the other panels depends on the length of the vector and the value of p i , since the vector's sum cannot be greater than 1.All remaining vector entries are set equal to each other.
In the i = 1 case, we observed that entropy is minimized when as few species as possible possess non-zero abundances.We thus construct the minimizing vector by setting the abundances of the second through the ith species equal to p i , placing all remaining weight in the first species.
The bounds are proven in Appendix A.
Theorem 3 For a fixed value p i of the frequency of the ith-most abundant species in a community with n species, with i 2, the vector maximizing H is p , where In the p i 1 n case, i entries equal p i .In the p i < 1 n case, i − 1 entries equal The upper bound on H is Theorem 4 For a fixed value p i of the frequency of the ith-most abundant species in a community with n species, with i 2, the vector minimizing H is p , where p = 1 − (i − 1) p i , p i , p i , . . ., p i , 0, 0, . . ., 0 , and i − 1 entries equal p i .The lower bound on H is We explore these bounds visually in Figs. 2 and 3. Figure 2 gives the upper and lower bounds on Shannon entropy as a function of p 2 , the abundance of the second-most abundant species, for varying vector lengths n.As in Fig. 1, the upper bound increases with increasing n, and the lower bound remains the same irrespective of the value of n.As p 2 increases toward its maximum of 1 2 , the Shannon entropy becomes tightly constrained, with the upper and lower bounds approaching the same point ( 1 2 , log 2).We examine the bounds on Shannon entropy conditional on the ith-largest entry for vectors of length n = 2, 3, 4, and 5 in Fig. 3.For all four panels, the shapes outlined by the bounds for fixed p 1 are identical to the corresponding panels in Fig. 1, and the shapes outlined by the bounds for fixed p 2 match corresponding panels in Fig. 2. For all i, as n increases from n = i, the upper bound on H with respect to p i increases; the lower bound remains the same.
Examining the regions between the upper and lower bounds in the ( p i , H )-plane for distinct values of i with fixed n, we observe a number of patterns.For n = 2, the regions overlap only trivially, at the point ( 1 2 , log 2) in Fig. 3A (Proposition B.1).For n = 3, the overlap is also trivial, occurring along a curve; for n = 3, the upper bound on Shannon entropy given p 2 overlaps exactly with the lower bound on Shannon entropy conditional on p 3 from 0 to 1 3 , then with the lower bound given p 1 from 1 3 to 1 2 (Fig. 3B, upper bound of the turquoise region; Proposition B.2).For fixed n 4, regions for differing i begin to have nontrivial overlap.For 2 i n − 2, the region between the bounds conditional on p i overlaps with the region between the bounds conditional on p 1 (Fig. 3D, the turquoise and yellow regions overlap the navy region; Proposition B.6).The upper bound conditional on p n−1 exactly overlaps with the lower bound conditional on p 1 for a small interval from 1 n to 1 n−1 (Fig. 3D, the orange upper bound and the navy lower bound overlap between 1 5 and 1 4 ; Proposition B.7).On the left-hand side of each panel, we can see that when p i = 0, the intervals of possible Shannon entropy values for each pair of indices i 1 , i 2 2 overlap for n 3; the overlap has nonzero length at p i = 0, except that for the pair of values (i 1 , i 2 ) = (2, n), the intervals overlap only at a single point (Proposition B.8).This overlap of the intervals for a pair of indices continues as p i increases; for the pair Fig. 3 Upper and lower bounds on Shannon entropy as functions of the abundance of the ith-most abundant species, p i , for fixed n.A n = 2. B n = 3. C n = 4. D n = 5.Lines give the upper and lower bounds from Theorems 3 and 4, colored according to which abundance, p i , is fixed.The space between the upper and lower bounds for a given i is shaded based on the color used for the associated bounds is a curve rather than a region of nonzero area (Proposition B.9).We explain these observations mathematically in Appendix B.

Data analysis
Having established upper and lower bounds on Shannon entropy as functions of the abundance of the ith-most abundant species, we turn to two data sets in order to explore applications of the bounds to communities with vastly differing numbers of taxa: one example has dozens of taxa, the other has tens of thousands.We then compare diversity between the two examples, exploring the extent to which knowledge of the bounds helps to inform comparisons of diversity between communities occupying different regions of the space of possible taxon abundances.Wong et al. (2018) analyzed 25 coral reef communities sampled off the southern coast of Singapore.Among the communities, 18 are "fringe" reefs that border 11 offshore islands, 5 are offshore "patch" reefs that are exposed at low tide, and 2 are "regrowth" reefs growing on artificial structures.At each site, Wong et al. (2018) sampled five 20meter transects, for each transect generating a vector of observed species abundances.The 25 × 5 = 125 vectors can each be normalized to produce relative abundance vectors that sum to 1. Across the 125 transects, 138 species were observed.For each of 25 study sites, 5 transects were measured.The 25 sites include 18 "fringe" sites that border offshore islands, 5 offshore "patch" sites that are exposed at low tide, and 2 "regrowth" sites growing on artificial structures.Accordingly, there are 18 × 5 = 90 fringe data points, 5 × 5 = 25 patch data points, and 2 × 5 = 10 regrowth data points.The three distributions differ in Shannon entropy (Kruskal-Wallis test, P = 0.037).Regrowth reefs have significantly higher values of Shannon entropy than fringe or patch reefs (Wilcoxon rank sum test, P < 0.05)

Coral reefs
The species richness of transects varies from 6 to 31, with mean 15.3, median 14, and standard deviation 6.1.To compare the species diversity across transects, we computed the Shannon entropy of each transect's species relative abundance vector.The Shannon entropy ranges from 1.1 to 3.2 across transects, with mean 2.2, median 2.2, and standard deviation 0.4.We observe in Fig. 4 that the regrowth reefs have significantly higher Shannon entropy than the patch reefs and fringe reefs (Wilcoxon rank sum test, two-tailed P = 0.028 for regrowth vs. patch, P = 0.034 for regrowth vs. fringe).No significant difference exists for patch and fringe reefs (P = 0.183).
We saw previously that the Shannon entropy of a species composition vector is influenced by many features of the vector, including the species richness and the evenness of the community composition.For example, if a community has one abundant species, then its Shannon entropy is strongly constrained by the abundance of that species.What drives the elevated diversity of the regrowth reef community when compared to the other two reef types?
Figure 5 depicts the relationship between each community's Shannon entropy and the bounds on Shannon entropy conditional on various species abundances.Inspection of these relationships reveals a visual difference among the three reef types.Regrowth reefs appear to have lower abundances for more common species such as the 1st, 2nd, and 3rd most abundant (Fig. 5A-C) and higher abundances for rarer species such as the 10th, 14th, and 18th most abundant (Fig. 5D-F), allowing for higher values of Shannon entropy.A statistical test verifies this visual observation: comparing the abundance of the ith-most abundant species in regrowth reefs to that of non-regrowth reefs, we find that the frequencies of the 1st through 3rd most abundant species in regrowth reefs are lower than those in non-regrowth reefs (Wilcoxon rank sum test, P = 0.029 for i = 1, P = 0.064 for i = 2, P = 0.036 for i = 3), whereas the 9th through 19th most abundant species are significantly greater in regrowth than non-regrowth reefs (Wilcoxon rank sum test, P 0.05; Figure S1).This result suggests that the regrowth The bounds assume n = 31, as 31 was the largest number of species observed across the 125 transects (mean 15.3, median 14, standard deviation 6.1, minimum 6).Bounds are computed according to Theorems 3 and 4. Each point represents one transect; points are colored according to reef type.As in Fig. 4, each panel shows 90 fringe data points, 25 patch data points, and 10 regrowth data points reefs have higher Shannon entropy values than the patch or fringe reefs in part because of their lower abundances of common species and higher abundances of rare specieswhich, in turn, owing to higher upper and lower bounds at those abundances, enable the Shannon entropy to reach higher values.Use of the bounds thus helps to illustrate that the rare species drive a difference in Shannon entropy across community types.

Sponge microbiomes
We next analyzed microbial communities associated with 3533 sea sponges representing 24 distinct taxonomic orders, as sampled from 34 countries worldwide by Moitinho-Silva et al. (2017).For each sponge sample, Moitinho-Silva et al. (2017) amplified and sequenced the V4 region of the 16 s rRNA gene to generate a vector of abundances of microbial operational taxonomic units (OTUs).As with the coral data, we normalized each vector to generate relative abundance vectors.
Much variability exists in microbiome composition across the sampled sponges: OTU richness varies from 1 to 21,595, with mean 2230 and median 1734.We found a wide distribution of Shannon entropy values across microbiomes, ranging from 0 to 8.1, with mean 3.4, median 3.5, and standard deviation 1.3.In Fig. 6, we plot these values of Shannon entropy against the abundance of the ith-most abundant OTU for i = 1, 2, 3, 10, 14, 18.
To explore an application of the bounds on Shannon entropy in which multiple communities have similar Shannon entropy values rather than values that differ significantly, as in the corals, we highlight in Fig. 6 three microbial communities in red, yellow, and blue.These communities have similar Shannon entropy values: 1.10 for  S1 red, 1.11 for yellow, and 1.11 for blue (Table S1)-all near H ( 1 3 , 1 3 , 1 3 ) = log 3 ≈ 1.10.Without knowledge of their relative abundance vectors, it is difficult to interpret the similarity in Shannon entropy: one community could have high evenness and low richness, another could have high richness but uneven abundances.
Examining H in relation to p i and the bounds on H in terms of p i , we see that the three communities' similar values of Shannon entropy are produced by quite different relative abundance vectors.Figure 6A shows that the red community has the lowest possible value of entropy given the abundance of its most abundant OTU.The yellow community is similarly positioned near the lower bound.The blue community, on the other hand, though it has a Shannon entropy that is intermediate between the lower and upper bounds, has a value of p 1 that tightly constrains the Shannon entropy.The tight constraint suggests that the entropy of the blue community is largely determined by the most abundant OTU, whereas the red and yellow communities depend on other entries of the relative abundance vector to keep the entropy near the lower bound.
Which other entries affect the entropy?In Fig. 6B, the horizontal ordering of the points has changed: the blue community, dominated by one abundant OTU, is near p 2 = 0, whereas the red and yellow communities have p 2 ≈ p 1 .Figure 6C identifies the abundance vector of the red community: the upper and lower bounds on Shannon entropy as a function of p 3 meet at p 3 = 1 3 , a point attained by exactly one relative abundance vector: p = ( 1 3 , 1 3 , 1 3 ).As the red community has p 3 = 1 3 , this plot reveals that the red community has species richness n = 3 and that its sampled community is evenly distributed.The blue community lies near p 3 = 0, and the yellow community also lies near p 3 = 0, suggesting that its community is largely dominated by just the first two OTUs.
An investigation of higher values of i (Fig. 6D-F) further illustrates that all three communities are dominated by their first few entries.The blue community continues to lie to the right of 0, with p 18 > 0 (Fig. 6F).In fact, the blue community has richness n = 1678, but because p 1 is so high, in the part of the domain where Shannon entropy is tightly constrained, the subsequent entries after the first have little impact on entropy.
Considering the relationship between three vectors with similar Shannon entropy together with the Shannon entropy bounds as functions of p i has illustrated that the same Shannon entropy can indicate a low diversity conditional on a specified abundance ( p 1 or p 2 for the red and yellow communities), or a highly constrained diversity largely determined by the first entry ( p 1 for the blue community), despite the occurrence of many non-zero entries in the vector.Knowledge of the constraints thus aids with the interpretation of Shannon entropy across communities not only when entropy values are different, as for the coral example, but also when they are similar.

Comparing the example data sets
We now illustrate the use of the entropy bounds in comparing communities across the two examples, as an example of how the bounds make quite different communities commensurable.Figure 7 plots distributions of Shannon entropy across communities for the two data sets.A simple interpretation of this comparison would conclude that, in terms of Shannon entropy, the coral communities are on average less diverse than the sponge microbiome communities, because they have significantly lower Shannon entropy (Wilcoxon rank sum test, P < 2.2 × 10 −16 ).The richnesses in the two examples, however, differ by nearly three orders of magnitude (n = 31 for corals, n = 21,595 for sponge microbiomes).Because the upper bound on entropy as a function of n is log n, the largest attainable value of entropy for a coral community is log 31 ≈ 3.4, whereas the corresponding maximum for a sponge microbiome is log 21,595 ≈ 10.0.How can we compare the entropies of communities that have such a large difference in taxon richness?
Consider plots of H in relation to p i for both data sets at once, along with the bounds on entropy with respect to the ith-most abundant taxon (Fig. 8).Although we see that the points for corals (navy) do have lower values of Shannon entropy than those for the sponge microbiomes (orange), the interpretation changes substantially.For the corals, the cloud of points is generally centered in the middle or top of the space between the bounds, with no points along the lower bound.Conversely, for the sponge microbiomes, the cloud of points consistently borders the lower bound, with few points near the upper bound.The sponge microbiome communities, though more diverse according to H , are not nearly as diverse as they could be given their high OTU richness; coral communities, on the other hand, are diverse given their relatively low species richness.
Note, however, that this conclusion is affected by our decision to use the maximal richness as the value of n for generating the bounds.Because the maximal Shannon entropy increases with n, many communities with richness less than the maximum cannot reach the upper bound because such a choice of n sets an upper bound that is higher than would be possible in those communities.If, on the other hand, we were to generate the bounds using a value of n that lies below the maximum, then we must either exclude samples with richnesses above n or truncate and renormalize them.
To probe this limitation, we performed two additional analyses.First, we reproduced Fig. 8 using the median coral species richness n = 14 instead of the maximum n = 31 and the median sponge microbiome OTU richness n = 1734 instead of the maximum n = 21,595 (Figure S2).We truncated vectors with more than 14 coral or 1734 sponge microbiome taxa to the 14 or 1734 most abundant taxa, renormalizing each vector so that its sum was still 1.Although upper bounds for both communities shift down slightly, the main observation from Fig. 8 remains visible: coral relative abundance vectors largely occupy a region squarely between the upper and lower bounds-or even closer to the upper bound than the lower-whereas sponge microbiome relative abundance vectors reach the lower but not the upper bound.Thus, even with a change in the way species richness is considered in obtaining the bounds, the coral communities, For our second approach to addressing the role of species richness in entropy bounds, we normalized each vector's entropy using bounds given that vector's length, n, and the chosen i: This approach ensures that the bound used for a community is suited to the richness of that community.It is unsuitable only in rare cases such as the red community in Fig. 6 in which p i = 1 i and the abundance vector of a community is determined by a single abundance-so that upper and lower bounds are equal and Eq. 6 has denominator zero.We exclude such communities from our calculation.
Figure 9 presents the normalized Shannon entropy of Eq. 6 versus p 1 , p 2 , and p 3 for the coral and sponge microbiome data.When each vector is normalized using bounds that account for that vector's length, n-and not a possibly larger value chosen to represent a collection of vectors, as in Fig. 8-the points fall closer to the upper bound.In particular, many of the sponge microbiome relative abundance vectors have normalized entropy values equal to 1, whereas none reached the upper bound in Fig. 8.Despite this trend, the coral communities consistently have significantly higher normalized Shannon entropy than the sponge microbiome communities (Fig. 10; Wilcoxon rank sum test, P < × 10 −16 for normalizations based on p 1 , p 2 , or p 3 ), a reversal of the pattern for unnormalized entropy (Fig. 7).This statistical result formalizes our observation that, despite having lower absolute values of Shannon entropy than the sponge microbiome communities, the coral communities' values of Shannon entropy lie closer to their upper bounds.

Discussion
We have explored the range of Shannon entropy values that can be attained by a frequency vector of specified length and fixed ith-largest entry-such as a vector of species relative abundances in a community.Our upper and lower bounds on Shannon entropy as a function of the number of species, n, and the abundance of the ith most common species, p i , characterize the relationship between Shannon entropy and p i , providing insight into the way in which entropy values are constrained by the abundances of more abundant or less abundant species.
Our main mathematical results, Theorems 3 and 4, generalize a previous result of Aw and Rosenberg (2018) on the Shannon entropy bounds with respect to p 1 , the abundance of the most abundant species.For each p i , 2 i n, the permissible region given subsequent abundance p i contains points ( 1 n , log n) and ( 1 i , log i) (Fig. 3).Unlike for p 1 , the permissible region for p i , i 2, contains the origin; the permissible region for p 1 instead contains the point (1, 0) (Fig. 1).The extension to p i for i 2 characterizes a new set of similar regions-examined in detail in Appendix B-that differ substantially from the region previously studied for i = 1.
To illustrate the utility of the mathematical results for studies of biodiversity, we considered them in two data sets.In coral communities, we found that the higher Shannon entropy of regrowth reefs, as opposed to patch or fringe reefs (Fig. 4), was driven by low abundances of common taxa and high abundances of rare taxa (Fig. 5).If p 1 is large, then the upper bound on Shannon entropy given p 1 is low, so entropy must be low (Fig. 1); conversely, if subsequent p i are large, then the lower bound on Shannon entropy given p i is high, so entropy must be relatively large (Fig. 3D).Because common species had relatively low abundances in the regrowth reefs, the communities occupied a region of ( p i , H )-space that for small i was not tightly constrained by p i , allowing them to achieve high values of entropy (Fig. 5A-C).Similarly, for larger i, the relatively large abundances of rare species in the regrowth reefs placed them in a region of ( p i , H )-space with high lower bounds, requiring these communities to have fairly high entropy (Fig. 5D-F).By visualizing the abundance of a fixed species in a community in relation to that community's Shannon entropy and its bounds, we were able both to identify differences between types of communities and to uncover properties of the communities that drove the differences.
In our analysis of corals, we considered communities with differing values of Shannon entropy; our second analysis, examining sponge microbiomes, considered communities with similar Shannon entropy values, despite quite different taxon abundance distributions.By studying three example communities' Shannon entropy values relative to the bounds on entropy, we identified key differences among the abundance vectors (Fig. 6).Whereas one community's entropy was strongly constrained by its large p 1 and lay close to its upper bound, the values of the other two were near their minima given p 1 (Fig. 6A).The similarity of the entropy values in these other communities was achieved by a low diversity among the subsequent taxa for communities with relatively low p 1 , as can be seen from the fact that these communities have entropy values that are also quite constrained in relation to p 2 , p 3 , or both (Fig. 6B,C).The bounds thus assist in explaining not only differences in entropy values across communities, but also similarities.
Finally, in our comparative analysis of the two example data sets, we demonstrated that knowledge of the bounds on entropy was useful for accurately interpreting differences in Shannon entropy distributions between the coral and sponge microbiome communities, which differ greatly in species richness (Fig. 8).A naive comparison of the Shannon entropies of the two community types suggested that the sponge microbiomes were more diverse than the corals (Fig. 7).However, when the bounds were considered, either visually (Fig. 8) or via normalization of the Shannon entropy by use of the bounds (Figs. 9 and 10), the coral communities were more diverse given the constraint of their species richness and the abundances of one of their more abundant species.
All these analyses address a challenge in comparing the diversity of communities with different numbers of species.Similarities in a single statistic such as Shannon entropy can obscure meaningful compositional differences or species richness differences between communities.Our mathematical results assist in understanding how Shannon entropy is constrained by individual abundances, enabling comparisons both through visually analyzing Shannon entropy in relation to the bounds and through computations of the normalized entropy in Eq. 6 (Fig. 10).In addition to the bounds of Aw and Rosenberg (2018) on Shannon entropy in terms of p 1 , the general upper bound of log n over all abundance vectors has long been used in normalizations (Pielou 1975, p. 15); a special case of a lower bound for all abundance vectors in finite samples with fixed species richness and sample size has also appeared in a normalization as well (Beisel and Moreteau 1997).Our normalization uses the tightest possible interval for any fixed p i , showing that for any p i = 1 n , Shannon entropy has a tighter upper bound than log n-and it has a non-zero lower bound as well.The transformation in Eq. 6 follows a form familiar from other normalizations (e.g.Beisel and Moreteau 1997;Jost 2010).
Because Shannon entropy is used often, mathematical properties of this statistic have implications for many routine ecological analyses.In particular, researchers reporting Shannon entropy might be advised to report not only the Shannon entropy, but also its upper and lower bounds in relation to the highest or subsequent abundances, so that the Shannon entropy value can be further contextualized.Such an approach has been suggested for a variety of diversity-related statistics in population genetics, for which mathematical bounds in relation to allele frequencies have been analogously reported (e.g.Maruki et al. 2012;Jakobsson et al. 2013;Garud and Rosenberg 2015;Alcala and Rosenberg 2017, 2019, 2022).
Our examples have demonstrated the value of the bounds in understanding the drivers of empirical differences in biodiversity between communities.However, the bounds might also be useful in efforts to test model-based theoretical predictions about species abundances (e.g.Chave 2004;Rosindell et al. 2011).For example, in tests of neutral and other models describing rare taxa in a community (e.g.Magurran and 2003), versions of Shannon entropy normalized by the bounds conditional on the abundances of common taxa could control for those abundances, serving as biodiversity metrics sensitive to the abundances of rare taxa.
The data analyses in corals and sponge microbiomes follow population-genetic studies such as Aw and Rosenberg (2018) in treating quantities measured in samples as parametric.A potential extension could incorporate the fact that both the relative abundances and the number of distinct species itself are measured in samples.Equation 12 of Alcala and Rosenberg (2017) described related bounds on an estimated value of the population-genetic statistic F ST in terms of a sample frequency.In that setting, the number of alleles was fixed at 2, but here, the number of distinct species in a sample underestimates the number in the full community, so that the permissible range for the estimated Shannon entropy might systematically expand-due to the increased number of species-as the sample is enlarged.An extension to the bounds that incorporates sampling phenomena might make use of approaches to estimation of Shannon entropy in the setting of species accumulation with increasing sample size (Chao and Shen 2003;Chao and Jost 2015).
Although our use of Shannon entropy has focused on biodiversity measurement, the mathematical results are broader in scope.First, although we have used the language of species abundances, the entropy bounds apply to any finite-length vectors of nonnegative elements that sum to 1, and are thus not limited to ecological abundance data.The bounds could have uses for other settings in which Shannon entropy is used as a diversity statistic, such as for other taxonomic levels or for population-genetic data (Sherwin et al. 2006(Sherwin et al. , 2017;;Aw and Rosenberg 2018).They contribute to an interdisciplinary body of work developing bounds on Shannon entropy in various contexts (e.g.Dembo et al. 1991;Berry and Sanders 2003;Khan et al. 2017).
Second, we have obtained our mathematical bounds on Shannon entropy as a corollary of general theorems that concern statistics with particular convexity properties (Appendix A).Related statistics such as the Rényi entropies or Hill numbers (Hill 1973;Jost 2006;Chao et al. 2014) possess the required properties, so that similar bounds will follow for these statistics in relation to the abundance of the ith most abundant species.Extensions could explore constraints on these statistics and the applications of the constraints to ecological data.
Note that we assume here that entries of w and v are in decreasing order.For two vectors that are not necessarily in decreasing order, w is also said to majorize v if, when the vector entries are permuted so that they are in decreasing order, condition (i) holds for the permuted vectors.
We begin by proving that vector p introduced in the statement of Theorem 3 is majorized by all other vectors with ith-largest entry p i (Lemma A.2). Next, we prove that the vector p majorizes all other vectors with ith-largest entry p i (Lemma A.3).We then need only introduce the idea of functions that preserve order under majorization before we can quickly obtain Theorems 3 and 4. Lemma A.2 Consider a non-negative vector w of length n with (i) entries in decreasing order, w i w j for i < j, (ii) n i=1 w i = 1, and (iii) for some i 2, the ith largest entry in w, w i , is equal to p i .Then w majorizes p .
Proof Let w be a vector of length n whose entries are in decreasing order, the sum of whose entries is 1, and whose ith component equals p i .Then w j p i for all j i, w j p i for all j i, and w i = p i .Condition (ii) of Definition A.1 is satisfied, as w and p both have sum 1.To verify condition (i), we break the problem into two cases.
(1) Suppose p i 1 n .Because w j p i for all j i, for all k in [1, i], k j=1 w j kp i = k j=1 p j .It remains to show that for all k in [i + 1, n], k j=1 w j k j=1 p j .
Suppose for contradiction that some value of k in We have already shown that i j=1 w j i p i .We are left with Dividing by k −i, the mean of the w j for j in [i +1, k] is less than (1−i p i )/(n −i).Because the w j are in decreasing order, w k , the smallest of the w j for j in As a result, a contradiction of k j=1 w j < k j=1 p j .We conclude that for all k in [i + 1, n], k j=1 w j k j=1 p j .As we already showed that k j=1 w j k j=1 p j for all k in [1, i], we have proven w p .
(2) Suppose We also have w Setting k = i −1, we add w i = p i = p i to both sides of Eq.A1, obtaining i j=1 w j i j=1 p j .For j in [i + 1, n], p j = p i .Hence, for k in Suppose for contradiction that k j=1 w j < k j=1 p j for some k in We then have n j=k+1 w j = 1 − k j=1 w j > (n − k) p i : a sum of n − k terms exceeds (n − k) p i , so that the largest of the terms exceeds p i .We have reached a contradiction of the decreasing order of the entries in w, as w k+1 , w k+2 , . . ., w n are all bounded above by w i = p i .We conclude k j=1 w j k j=1 p j for all k in [i + 1, n].
As we already showed that k j=1 w j k j=1 p j for all k in [1, i], we have shown that for all k in [1, n], k j=1 w j k j=1 p j , and therefore w p .
Lemma A.3 Consider a non-negative vector v of length n with (i) entries in decreasing order, v i v j for i < j, (ii) n i=1 v i = 1, and (iii) for some i 2, the ith-largest entry in v, v i , is equal to p i .Then v is majorized by p .

Proof We first show that p
For k in [i, n], k j=1 p j = 1, so it is trivially true that k j=1 v j k j=1 p j .We have thus shown that k j=1 v j k j=1 p j for all k in n].We conclude that v ≺ p .
We now apply the lemmas.For convenience, we denote by n−1 the set of nonnegative vectors of length n with sum equal to 1. Definition A.4 Consider a function F : n−1 → R so that for all vectors w and v with w v, F(w) F(v).Such a function is said to be Schur-convex.If instead, for all vectors w and v with w v, F(w) F(v), F is said to be Schur-concave.
A Schur-convex function preserves the ordering of the vectors in n−1 under majorization.A Schur-concave function reverses the ordering.
Theorem A.5 Consider a function F : n−1 → R. If F is Schur-concave, then the vector p maximizes F over the subset of n−1 with ith-largest entry equal to p i .If F is Schur-convex, then p minimizes F over the subset of n−1 with ith-largest entry equal to p i .
Proof By Lemma A.2, the vector p is majorized by every vector w ∈ n−1 with fixed p i .That is, for all w in n−1 with ith-largest entry equal to p i , w p .By definition of Schur-concavity and Schur-convexity, if F is Schur-concave, then F(w) F( p ) for all such w, and p maximizes F. If F is Schur-convex, then F(w) F( p ) for all such w, and p minimizes F. Theorem A.6 Consider a function F : n−1 → R. If F is Schur-concave, then the vector p minimizes F over the subset of n−1 with ith-largest entry equal to p i .If F is Schur-convex, then p maximizes F over the subset of n−1 with ith-largest entry equal to p i .
Proof By Lemma A.3, the vector p majorizes every vector v ∈ n−1 with fixed p i .That is, for all v in n−1 with ith-largest entry equal to p i , v ≺ p .By definition of Schur-concavity and Schur-convexity, if F is Schur-concave, then F(v) F( p ) for all such v, and p minimizes F. If F is Schur-convex, then F(v) F( p ) for all such v, and p maximizes F.

Proof of Theorem 3
Shannon entropy is Schur-concave (Marshall et al. 2010, pp. 101, 562).By Theorem A.5, vector p is the vector in n−1 with fixed p i that maximizes Shannon entropy.

Proof of Theorem 4
Shannon entropy is Schur-concave (Marshall et al. 2010, pp. 101, 562).By Theorem A.6, vector p is the vector in n−1 with fixed p i that minimizes Shannon entropy.

Appendix B
In this appendix, we provide mathematical proofs of informal claims from the main text about the bounds, as observed in Fig. 3.As in Appendix A, n−1 denotes the set of non-negative vectors of length n with sum equal to 1.

Proposition B.1 Consider vectors p in
Proof (i) Because p 1 = 1 − p 2 , fixing one of the p i determines the other and the Shannon entropy: Thus, Shannon entropy has only one possible value for vectors of length 2 with one fixed entry, and the upper and lower bounds given p i exactly overlap.
(ii) Because p 1 p 2 , the domain for p 1 is [ 1 2 , 1] and the domain for Proof (i) For fixed p 3 in [0, 1 3 ], Shannon entropy is minimized at (1 − 2 p 3 , p 3 , p 3 ) (Theorem 4).In this same interval for p 2 , Shannon entropy is maximized at the vector (1 − 2 p 2 , p 2 , p 2 ) (Theorem 3).These vectors are the same when the same value is inserted for the lone free variable, confirming the exact overlap of the upper bound given p 2 and the lower bound given p 3 on the interval [0, 1 3 ].(ii) In [ 1 3 , 1 2 ], the vector minimizing Shannon entropy given fixed p 1 is ( p 1 , p 1 , 1 − 2 p 1 ) (Proposition 2), and the vector maximizing Shannon entropy given fixed p 2 is ( p 2 , p 2 , 1 − 2 p 2 ) (Theorem 3).These vectors are identical when the same value is inserted for the lone free variable, confirming the exact overlap of the upper bound given p 2 and the lower bound given p 1 on the interval [ To prove (C.i), it suffices to prove (x, i, n) > 0 for x in ( 1 n , 1 i+1 ].To prove (x, i, n) > 0, we prove (a) is strictly convex for x in ( 1 n , 1 i+1 ), or (b) has a local minimum at x = 1 n ; (c) f x = 1 n , i, n 0. For the continuous function , these three claims suffice to demonstrate (x, i, n) > 0 for x in ( 1 n , 1 i+1 ]: if is strictly convex on ( 1 n , 1 i+1 ) and has a minimum at x = 1 n , then must increase monotonically on ( 1 n , 1 i+1 )-and hence, (x, i, n) > 0 for all x in ( 1 n , 1 i+1 ].(a) Taking the second derivative and noting 1 − i x and 1 − (i + 1)x are positive on ( 1 n , 1 i+1 ), we have (C.iii)The proof is similar to that of (A.iii), as we continue to use the function g(x, i, n) from the proof of that statement.Let n 3 and suppose 3 i n − 1.To prove (C.iii), it suffices to prove g(x, i, n) > 0 for x in ( 1 n , 1 i ].To prove g(x, i, n) > 0, we prove (a) g is strictly concave for x in ( 1 n , 1 i ); (b) g is positive at the boundaries of the interval, g( 1 n ) > 0 and g( 1 i ) > 0. For the continuous function g, these two claims suffice to demonstrate g(x, i, n) > 0 for x in ( 1 n , 1 i ]: if g is strictly concave on ( 1 n , 1 i ), then it has a monotonically decreasing slope, so that if it is positive at the start and end of the interval, it cannot decrease to zero on the interior of the interval.

Fig. 1
Fig.1Upper and lower bounds on Shannon entropy as functions of the abundance of the most abundant species, p 1 , for varying species richness, n.A n = 2. B n = 3. C n = 4. D n = 5.E n = 6.F n = 7. G n = 10.H n = 100.I n = 10,000.For fixed n, H is maximized when p = ( 1 n , 1 n , . . ., 1 n ), with H ( 1 n , 1 n , . . ., 1 n ) = log n.As the number of entries, n, increases, this upper bound, log n, increases, so the range of the y-axis increases.The bounds are taken from Propositions 1 and 2. Note that panels H and I have y-axis scales that differ from those of the other panels

Fig. 4
Fig.4Distributions of Shannon entropy for three coral reef types.Each point represents a relative abundance vector of coral species along one transect.For each of 25 study sites, 5 transects were measured.The 25 sites include 18 "fringe" sites that border offshore islands, 5 offshore "patch" sites that are exposed at low tide, and 2 "regrowth" sites growing on artificial structures.Accordingly, there are 18 × 5 = 90 fringe data points, 5 × 5 = 25 patch data points, and 2 × 5 = 10 regrowth data points.The three distributions differ in Shannon entropy (Kruskal-Wallis test, P = 0.037).Regrowth reefs have significantly higher values of Shannon entropy than fringe or patch reefs (Wilcoxon rank sum test, P < 0.05)

Fig. 5
Fig. 5 Upper and lower bounds on Shannon entropy for coral communities, as functions of the abundance of the ith-most abundant species.A i = 1.B i = 2. C i = 3. D i = 10.E i = 14.F i = 18.The bounds assume n = 31, as 31 was the largest number of species observed across the 125 transects (mean 15.3, median 14, standard deviation 6.1, minimum 6).Bounds are computed according to Theorems 3 and 4. Each point represents one transect; points are colored according to reef type.As in Fig. 4, each panel shows 90 fringe data points, 25 patch data points, and 10 regrowth data points

Fig. 6
Fig. 6 Upper and lower bounds on Shannon entropy for sponge microbiome communities, as functions of the abundance of the ith-most abundant OTU.A i = 1.B i = 2. C i = 3. D i = 10.E i = 14.F i = 18.The bounds assume n = 21,595, the largest number of OTUs observed across the 3533 microbiomes (mean 2230, median 1734, standard deviation 2072, minimum 1).Each count in the heat map represents one sampled sponge microbiome; the heat map summarizes 3533 microbiomes.The three highlighted points represent similar Shannon entropy (H ≈ 1.1 ≈ log 3) but different evenness.Details on the three points appear in TableS1

Fig. 7 Fig. 8
Fig. 7 Distributions of Shannon entropy for coral (navy) and sponge microbiome communities (orange).There are 125 total coral communities and 3533 total sponge microbiomes represented.Entropy values for the coral communities are the same as those presented in Figs.4 and 5, whereas those for the sponge microbiomes are the same as those presented in Fig.6

Fig. 9 Fig. 10
Fig. 9 Shannon entropy for coral (navy) and sponge microbiome communities (orange), normalized in relation to entropy bounds as functions of the abundance of the ith-most abundant taxon.A i = 1.B i = 2. C i = 3.Each point represents one sampled relative abundance vector, with Shannon entropy normalized using Eq.6 (b) We show by direct computation that∂ ∂ x f x = 1 n , i, n = 0. (c) We show by direct computation that f x = 1 n , i, n = 0. (C.ii) H max ( p i 2 , n) H min ( p i 2 ,n) by definition of the bounds in Theorems 3 and 4.
(a) This statement follows as in (A.iii.a): the second derivative in Eq.B1 is negative on( 1 n , 1 i 2 ].(b)This statement follows for the left endpoint x = 1 n as in (A.iii.b);for the right endpoint, x = 1 i , g(x, i, n) = 2 i log 2 > 0.