From Assessing to Conserving Biodiversity pp 123-136 | Cite as
Measures of Biological Diversity: Overview and Unified Framework
Abstract
A variety of statistical measures of diversity have been employed across biology and ecology, including Shannon entropy, the Gini-Simpson index, so-called effective numbers of species (aka Hill’s measures), and more besides. I will review several major options and then present a comprehensive formalism in which all these can be embedded as special cases, depending on the setting of two parameters, labelled degree and order. This mathematical framework is adapted from generalized information theory. A discussion of the theoretical meaning of the parameters in biological applications provides insight into the conceptual features and limitations of current approaches. The unified framework described also allows for the development of a tailored solution for the measurement of biological diversity that jointly satisfies otherwise divergent desiderata put forward in the literature.
Keywords
Diversity Richness Evenness Entropy Information theorySuppose that four different species, X, Y, W, and Z, are present in a given environment at a certain time, counting exactly 50, 25, 15, and 10 organisms each, respectively. At the same time in a different location (alternatively: at a later moment in the same area) the numbers are 40, 30, 30, and 0, respectively. In which of these two situations one deals with a more diverse community?
This example, although drastically simplified, illustrates a rather general problem. With minor variations, X, Y, W, and Z might just as well be the firms operating in a sector of the economy (see, e.g., Chakravarty and Eichhorn 1991), the languages spoken in a region (see, e.g., Greenberg 1956), the types of television channels in a country (see, e.g., Aslama et al. 2004), or the parties in a political system (see, e.g., Golosov 2010) characterized by their shares of market, speakers, overall broadcasting, or parliamentary seats. In each of these domains, and still others, measuring diversity (or, conversely, concentration) has been a significant scientific issue. In biology and ecology, tracking diversity over space and time is of course a key topic for environmental concerns, but biological diversity also plays a relevant theoretical role for its connections with other variables of interest, such as stability, predation pressure, and so on.
In this chapter, I will review a variety of measures of biological diversity that have been employed and discussed across the scientific literature. Relying on a few intuitive illustrations, I will carry out an assessment of the strengths and limitations of some major options, including Shannon entropy, the Gini-Simpson index, so-called effective numbers of species, and more besides. I will also highlight the appeal of one specific measure, which might have not received adequate attention so far. Finally, I will describe a comprehensive formalism and point out how all diversity measures previously considered can be conveniently embedded in it as special cases, depending on the setting of two parameters, labelled order and degree. This unified mathematical framework is adapted from generalized information theory (Aczél 1984). As we will see, it provides insight into the conceptual features of current approaches and can allow for tailored technical solutions for the measurement of biological diversity.
6.1 Richness
For our current purposes, measuring diversity has to do with how a given quantity is distributed among some well defined categories.^{1} In biological applications, it is typical (although by no means necessary) that such categories amount to distinct species characterized by their relative abundance. The latter quantity, in turn, is often simply measured by the proportion of organisms of that species in the overall target population (but biomass can be employed as well, for instance). In what follows, we will denote diversity as D(p_{1}, …, p_{n}), where n species—s_{1}, …, s_{n}—are involved and p_{i} is the relative abundance of the i-th species, s_{i}. With this bit of formal notation, we can thus represent our initial example with categories X, Y, W, and Z as concerning the comparison between D(0.5, 0.25, 0.15, 0.10) and D(0.40, 0.30, 0.30, 0). Note that, as a direct consequence of our definition, p_{1}, …, p_{n} will always be positive numbers summing to 1, so that (p_{1}, …, p_{n}) actually represents a probability distribution. In our canonical interpretation, p_{i} equals the probability that a randomly selected individual from the target population belongs to species s_{i}.
Test Case 1
(Jost 2006, p. 363). Let us consider communities consisting of n equally common species, like (1/5, …, 1/5) (with n = 5) and (1/10, …, 1/10) (with n = 10). Arguably, diversity in the latter case should just be twice as in the former. A compelling measure of diversity should recover such assessment. Richness clearly does, because Richness(1/10, …, 1/10) = 10 and Richness(1/5, …, 1/5) = 5.
The richness measure is of course completely transparent in its interpretation, but it is also simplistic in a fairly obvious way: it is entirely insensitive to how even/uneven the distribution is. Here is a second test case to clarify the point.
Test Case 2
(evenness sensitivity, see Pielou 1975, p. 7). For a given number of species n, a compelling measure of diversity should assign maximum value to a completely even distribution (with p_{1} = … = p_{n} = 1/n), and a strictly lower value to a distribution which is much more skewed. Richness, however, clearly fails this condition. For instance, one has Richness(0.25, 0.25, 0.25, 0.25) = 4 = Richness(0.97, 0.01, 0.01, 0.01).
6.2 Entropies and Diversity
One important and well-known feature of both the Shannon and the Gini index of diversity is that they are concave functions. The formal definition of concavity involves the notion of a mixture\( {M}_{\alpha}^{P,{P}^{\ast }}=\left(\alpha {p}_1+\left(1-\alpha \right){p}_1^{\ast },\dots, \alpha {p}_n+\left(1-\alpha \right){p}_n^{\ast}\right) \) of two distributions of relative abundance P = (p_{1}, … , p_{n}) and \( {P}^{\ast }=\left({p}_1^{\ast },\dots, {p}_n^{\ast}\right) \), where α ∈ [0,1] determines the relative weights of the distributions combined. As a plain illustration, if P = (0.9, 0.1), P^{*} = (0.7, 0.3), and α = 0.5, then the mixture is \( {M}_{\alpha}^{P,{P}^{\ast }} \) = (0.8, 0.2). A measure of diversity D is then said to be concave if and only if, for any P, P^{*}, and any α ∈ [0,1], one has \( \alpha D(P)+\left(1-\alpha \right)D\left({P}^{\ast}\right)\le D\left({M}_{\alpha}^{P,{P}^{\ast }}\right) \). Concavity is sometimes advocated for measures of biological diversity as conveying the idea that, if one pools together two distinct populations X and Y composed by possibly different abundance distributions (P vs. P^{*}) of the same list of n species, then the aggregate should have a diversity that is at least as great as the average of the initial diversities of X and Y. Whether or not such condition is meant to be compelling in general, one significant implication of concavity is involved in the following example.
Test Case 3
Consider three subsequent moments in time t_{1}, t_{2}, and t_{3}, with corresponding relative abundace distributions of a population with n = 10, as follows.
t_{1}: (0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1)
t_{2}: (0.5, 0.1, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05)
t_{3}: (0.9, 0.1, 0, 0, 0, 0, 0, 0, 0, 0)
Quite clearly, a drop in diversity occurred from t_{1} to t_{2}. But, arguably, the loss of diversity was even greater from t_{2} to t_{3}, essentially because most of the species (in fact, 80% of them) disappeared altogether. A compelling measure of diversity should recover such assessment, and concave measures such as D_{Shannon} and D_{Gini} do. D_{Shannon} takes values 2.303 at t_{1}, 1.775 at t_{2}, and then drops to 0.325 at t_{3}. With D_{Gini}, we have 0.90, 0.72, and 0.18, respectively.
Biologists frequently use measures of diversity to detect changes in the environment due to pollution, climate change, or other factors. […] Suppose a continent has a million equally common species, and a meteor impact kills 999,900 of the species, leaving 100 species untouched. Any biologist, if asked, would say that this meteor impact caused a large absolute and relative drop in diversity. Yet D_{Gini} only decreases from 0.999999 to 0.99, a drop of less than 1%. Evidently, the metric of this measure does not match the intuitive concept of diversity as used by biologists, and ecologists relying on D_{Gini} will often misjudge the magnitude of ecosystem change. This same problem arises when D_{Shannon} is equated with diversity.
6.3 Effective Numbers
Consider Jost’s meteor illustration above. According to the replication principle (which Jost 2009 strongly advocates), diversity should be 10.000 times lower after the impact than it was before (1.000.000/100). Is it possible to define a measure such that (as it happens with richness but not with entropy) the replication principle is retained and at the same time (as it happens with entropy but not richness) the (un)evenness of the distribution is also integrated in the assessment of diversity?
Some properties of alternative ways to quantify biological diversity.
Test case1 (replication principle) | Test case2 (evenness sensitivity) | Test case3 (concavity) | |
---|---|---|---|
Richness | Yes | No | Yes |
Entropy (D_{Gini}) | No | Yes | Yes |
Effective number (D_{Gini-EN}) | Yes | Yes | No |
6.4 Parametric Measures of Diversity
Some important special cases of the Sharma-Mittal framework for statistical measures of diversity
(r,t)-setting | Diversity measure |
---|---|
r = 0 | \( {D}_{Sharma- Mittal}^{\left(0,0\right)}\left({p}_1,\dots, {p}_n\right)= \) \( Richnes{s}^{\ast}\left({p}_1,\dots, {p}_n\right)=\sum \limits_{i=1}^n{p_i}^0-1 \) |
t = 0 | |
r → 1 | \( {D}_{Sharma- Mittal}^{\left(1,1\right)}\left({p}_1,\dots, {p}_n\right)= \) \( {D}_{Shannon}\left({p}_1,\dots, {p}_n\right)=\sum \limits_{i=1}^n{p}_i\log \left(\frac{1}{p_i}\right) \) |
t → 1 | |
r = 2 | \( {D}_{Sharma- Mittal}^{\left(2,2\right)}\left({p}_1,\dots, {p}_n\right)= \) \( {D}_{Gini}\left({p}_1,\dots, {p}_n\right)=1-\sum \limits_{i=1}^n{p}_i^2 \) |
t = 2 | |
r = 2 | \( {D}_{Sharma- Mittal}^{\left(2,0\right)}\left({p}_1,\dots, {p}_n\right)= \) \( {D}_{Gini- EN}^{\ast}\left({p}_1,\dots, {p}_n\right)=\frac{1}{\sum_{i=1}^n{p}_i^2}-1 \) |
t = 0 | |
r → 1 | \( {D}_{Sharma- Mittal}^{\left(1,0\right)}\left({p}_1,\dots, {p}_n\right)= \) \( {D}_{Shannon- EN}^{\ast}\left({p}_1,\dots, {p}_n\right)={e}^{\sum \limits_{i=1}^n{p}_i\log \left(\frac{1}{p_i}\right)}-1 \) |
t = 0 | |
r = ½ | \( {D}_{Sharma- Mittal}^{\left(\frac{1}{2},0\right)}\left({p}_1,\dots, {p}_n\right)= \) \( {D}_{Root}^{\ast}\left({p}_1,\dots, {p}_n\right)={\left(\sum \limits_{i=1}^n\sqrt{p_i}\right)}^2-1 \) |
t = 0 |
What is the meaning of the order (r) and degree (t) parameters in the Sharma-Mittal formalism when employed in the measurement of biological diversity?
Importantly, for extreme values of the order parameter, an otherwise natural idea of continuity fails in the measurement of diversity: when r goes to either zero or infinity, it is not the case that small (large) changes in the abundance distribution produce comparably small (large) changes in diversity. To illustrate, all order-0 entropies remain entirely invariant upon as large a change as that from, say, (1/3, 1/3, 1/3) to (0.98, 0.01, 0.01), while they yield clearly different values for as small a change as that from (0.98, 0.01, 0.01) to (0.99, 0.01, 0). Order-∞ entropies, in turn, remain entirely invariant upon as large a change as that from, say, (0.50, 0.25, 0.25) to (0.50, 0.50, 0), yet they still yield distinct values for as small a change as that from (0.50, 0.25, 0.25) to (0.52, 0.24, 0.24).
- (i)
Measures lying on the x-axis are obtained by positing t = 0, thus yielding:
- (ii)
As we have seen with some special cases like \( {D}_{Gini- EN}^{\ast}\left({p}_1,\dots, {p}_n\right) \), effective number measures of diversity may not be concave functions. Most Sharma-Mittal measures are concave, however: \( {D}_{Sharma- Mittal}^{\left(r,t\right)}\left({p}_1,\dots, {p}_n\right) \) generates a concave function as long as t ≥ 2–1/r (see Hoffmann 2008 for a proof). This implies, in particular, the concavity of all measures lying on the diagonal line in Fig. 6.1, which are obtained by positing r = t, thus yielding:
The relevance of statistical measures of diversity is an open issue for the theoretical biologist and the philosopher addressing the investigation of biodiversity, and indeed a matter of much debate (see, e.g., Barrantes and Sandoval 2009 and Blandin 2015). In this chapter, no claim has been made to the resolution of divergences in this respect. Consideration of the variety and integration of diversity measures remains important, however, for the debate to be adequately informed. Advocates of the measurement of diversity should of course be aware of the tools at their disposal. Opponents and skeptics, on the other hand, should be careful to make sure that their legitimate doubts are not inflated by too narrow an outlook on the ways in which the notion of biological diversity can be formally unpacked and assessed.
Footnotes
- 1.
Our use of the term category here is very general: essentially, categories in our current sense are the elements of any partition of interest. This terminology is thus not constrained by the more technical and specific distinction between “species category” and “species taxon” (e.g., Bock 2004). In particular, a set of different taxa can be treated as a partition of categories in our terms.
- 2.
The choice of a base for the logarithm is a matter of conventionally setting a unit of measurement. Usual options include 2, 10, and e. We will adopt the latter throughout our discussion, thus employing the natural logarithm in subsequent calculations.
- 3.
As pointed out by Arimoto (1971, p. 186), it also turns out that \( {D}_{Root}^{\ast}\left({p}_1,\dots, {p}_n\right)=\sum \limits_{i=1}^n\sum \limits_{j=1,j\ne i}^n\sqrt{p_i{p}_j} \).
References
- Aczél, J. (1984). Measuring information beyond communication theory: Why some generalized information measures may be useful, others not. Aequationes Mathematicae, 27, 1–19.CrossRefGoogle Scholar
- Arimoto, S. (1971). Information-theoretical considerations on estimation problems. Information and Control, 19, 181–194.CrossRefGoogle Scholar
- Aslama, M., Hellman, H., & Sauri, T. (2004). Does market-entry regulation matter? Gazette: The International Journal for Communication Studies, 66, 113–132.CrossRefGoogle Scholar
- Barrantes, G., & Sandoval, L. (2009). Conceptual and statistical problems associated with the use of diversity indices in ecology. International Journal of Tropical Biology, 57, 451–460.Google Scholar
- Blandin, P. (2015). La diversità del vivente prima e dopo la biodiversità. Rivista di Estetica, 59, 63–92.CrossRefGoogle Scholar
- Bock, W. J. (2004). Species: The concept, category, and taxon. Journal of Zoological Systematics and Evolutionary Research, 42, 178–190.CrossRefGoogle Scholar
- Chakravarty, S., & Eichhorn, W. (1991). An axiomatic characterization of a generalized index of concentration. Journal of Productivity Analysis, 2, 103–112.CrossRefGoogle Scholar
- Chao, A., & Jost, L. (2012). Diversity measures. In A. Hastings & L. J. Gross (Eds.), Encyclopedia of theoretical ecology (pp. 203–207). Berkeley: University of California Press.Google Scholar
- Crupi, V., Nelson, J., Meder, B., Cevolani, G., & Tentori, K. (2018). Generalized information theory meets human cognition: Introducing a unified framework to model uncertainty and information search. Cognitive Science, 42, 1410–1456.CrossRefGoogle Scholar
- Csizár, I. (2008). Axiomatic characterizations of information measures. Entropy, 10, 261–273.CrossRefGoogle Scholar
- Gini, C. (1912). Variabilità e mutabilità. In Memorie di metodologia statistica, I: Variabilità e concentrazione (pp. 189–358). Milano: Giuffrè, 1939.Google Scholar
- Golosov, G. V. (2010). The effective number of parties: A new approach. Party Politics, 16, 171–192.CrossRefGoogle Scholar
- Greenberg, J. H. (1956). The measurement of linguistic diversity. Language, 32, 109–115.CrossRefGoogle Scholar
- Hill, M. (1973). Diversity and evenness: A unifying notation and its consequences. Ecology, 54, 427–431.CrossRefGoogle Scholar
- Hoffmann, S. (2008). Generalized distribution-based diversity measurement: Survey and unification. Faculty of Economics and Management Magdeburg (Working Paper 23). http://www.ww.uni-magdeburg.de/fwwdeka/femm/a2008_Dateien/2008_23.pdf. Accessed 25 Sept 2018.
- Hoffmann, S., & Hoffmann, A. (2008). Is there a “true” diversity? Ecological Economics, 65, 213–215.CrossRefGoogle Scholar
- Hurlbert, S. H. (1971). The nonconcept of species diversity: A critique and alternative parameters. Ecology, 52, 577–586.CrossRefGoogle Scholar
- Jost, L. (2006). Entropy and diversity. Oikos, 113, 363–375.CrossRefGoogle Scholar
- Jost, L. (2009). Mismeasuring biological diversity: Responses to Hoffmann and Hoffmann (2008). Ecological Economics, 68, 925–928.CrossRefGoogle Scholar
- Keylock, J. C. (2005). Simpson diversity and the Shannon-Wiener index as special cases of a generalized entropy. Oikos, 109, 203–207.CrossRefGoogle Scholar
- MacArthur, R. H. (1965). Patterns of species diversity. Biological Reviews of the Cambridge Philosophical Society, 40, 510–533.CrossRefGoogle Scholar
- Patil, G., & Taille, C. (1982). Diversity as a concept and its measurement. Journal of the American Statistical Association, 77, 548–561.CrossRefGoogle Scholar
- Pielou, E. C. (1975). Ecological diversity. New York: Wiley.Google Scholar
- Ricotta, C. (2003). On parametric evenness measures. Journal of Theoretical Biology, 222, 189–197.CrossRefGoogle Scholar
- Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27, 379–423 and 623–656.CrossRefGoogle Scholar
- Sharma, B., & Mittal, D. (1975). New non–additive measures of entropy for discrete probability distributions. Journal of Mathematical Sciences (Delhi), 10, 28–40.Google Scholar
- Simpson, E. H. (1949). Measurement of diversity. Nature, 163, 688.CrossRefGoogle Scholar
- Tsallis, C. (1988). Possible generalization of Boltzmann-Gibbs statistics. Journal of Statistical Physics, 52, 479–487.CrossRefGoogle Scholar
- Tsallis, C. (2004). What should a statistical mechanics satisfy to reflect nature? Physica D, 193, 3–34.CrossRefGoogle Scholar
- Vajda, I., & Zvárová, J. (2007). On generalized entropies, Bayesian decisions, and statistical diversity. Kybernetika, 43, 675–696.Google Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.