# Phylogenetic Diversity Measures and Their Decomposition: A Framework Based on Hill Numbers

## Abstract

Conservation biologists need robust, intuitive mathematical tools to quantify and assess patterns and changes in biodiversity. Here we review some commonly used abundance-based species diversity measures and their phylogenetic generalizations. Most of the previous abundance-sensitive measures and their phylogenetic generalizations lack an essential property, the replication principle or doubling property. This often leads to inconsistent or counter-intuitive interpretations, especially in conservation applications. Hill numbers or the “effective number of species” obey the replication principle and thus resolve many of the interpretational problems. Hill numbers were recently extended to incorporate phylogeny; the resulting measures take into account phylogenetic differences between species while still satisfying the replication principle. We review the framework of phylogenetic diversity measures based on Hill numbers and their decomposition into independent alpha and beta components. Both additive and multiplicative decompositions lead to the same classes of normalized phylogenetic similarity or differentiation measures. These classes include multiple-assemblage phylogenetic generalizations of the Jaccard, Sørensen, Horn and Morisita-Horn measures. For two assemblages, these classes also include the commonly used *UniFrac* and *PhyloSør* indices as special cases. Our approach provides a mathematically rigorous, self-consistent, ecologically meaningful set of tools for conservationists who must assess the phylogenetic diversity and complementarity of potential protected areas. Our framework is applied to a real dataset to illustrate (i) how to use phylogenetic diversity profiles to completely convey species abundances and phylogenetic information among species in an assemblage; and (ii) how to use phylogenetic similarity (or differentiation) profiles to assess phylogenetic resemblance or difference among multiple assemblages.

### Keywords

Diversity Diversity decomposition Hill numbers Phylogenetic diversity Replication principle Species diversity## Introduction

Many of the most pressing and fundamental questions in biodiversity conservation require robust and sensible measures for quantifying and assessing changes in biodiversity. Many environmental and monitoring projects also require objective and meaningful similarity (or differentiation) measures to compare the diversities of multiple assemblages and their degree of complementarity in order to best conserve genetic, species, and ecosystem diversity. An enormous number of diversity measures and related similarity (or differentiation) indices have been proposed, not only in ecology but also in genetics, economics, information science, linguistics, physics, and social sciences, among others. See Magurran (2004) and Magurran and McGill (2011) for overviews.

In traditional species diversity measures, all species are considered to be equally different from each other; only species richness and abundances are involved. There are two general approaches: parametric and non-parametric (Magurran 2004). Parametric approaches assume a particular species abundance distribution (such as the lognormal or gamma) or a species rank abundance distribution (such as the negative binomial or log-series), and then use the parameters (e.g., Fisher’s alpha) of the distribution to quantify diversity. However, these methods often do not perform well and the results are un-interpretable unless the “true” species abundance distribution is known (Colwell and Coddington 1994; Chao 2005). The parametric model also does not permit meaningful comparison of assemblages with different abundance distributions. For example, a log-normal abundance model cannot be compared to an assemblage whose abundance distribution follows a gamma distribution. Non-parametric methods make no assumptions about the distributional form of the underlying species abundance distribution. The most widely used abundance-sensitive non-parametric measures have been the Shannon entropy and the Gini-Simpson index. These two measures, along with species richness were integrated into a class of measures called generalized entropies (Havrdra and Charvat 1967; Daróczy 1970; Patil and Taillie 1979; Tsallis 1988; Keylock 2005), which will be briefly reviewed in this chapter.

How to quantify abundance-based species diversity in an assemblage has been one of the most controversial issues in community ecology (e.g. Hurlbert 1971; Routledge 1979; Patil and Taillie 1982; Purvis and Hector 2000; Jost 2006, 2007; Jost et al. 2010). There have also been intense debates on the choice of diversity partitioning schemes; see Ellison (2010) and the Forum that follows it. Surprisingly, all authors in that forum achieved a consensus on the use of Hill numbers, also called “effective number of species”, as the best choice to quantify abundance-based species diversity. Hill numbers are a mathematically unified family of diversity indices (differing among themselves only by a parameter*q*) that incorporate species richness and species relative abundances. They were first used in ecology by MacArthur (1965, 1972), developed by Hill (1973), and recently reintroduced to ecologists by Jost (2006, 2007).

Hill numbers obey the *replication principle* or *doubling**property*, an essential mathematical property that capture biologists’ notion of diversity (MacArthur 1965; Hill 1973). This property requires that if we have *N* equally diverse, equally large assemblages with no species in common, the diversity of the pooled assemblage must be *N* times the diversity of a single group. In other words, they are linear with respect to addition of equally-common species. We will review different versions of this property later. Classical diversity measures, such as Shannon entropy and the Gini-Simpson index, do *not* obey this principle and can lead to inconsistent or counter-intuitive interpretations, especially in conservation applications (Jost 2006, 2007). Hill numbers resolve many of the interpretational problems caused by classical diversity indices. Diversity measures that obey the replication principle yield self-consistent assessment in conservation applications, have intuitively-interpretable magnitudes, and can be meaningfully decomposed. In this chapter, Hill numbers are adopted as a general framework for quantifying and partitioning diversities.

Pielou (1975, p. 17) was the first to notice that traditional abundance-based species diversity measures could be broadened to include phylogenetic, functional, or other differences between species. We here concentrate on phylogenetic differences, though our framework can also be extended to functional traits (Tilman 2001; Petchey and Gaston 2002; Weiher 2011). For conservation purposes, an assemblage of phylogenetically divergent species is more diverse than an assemblage consisting of closely related species, all else being equal. Phylogenetic differences among species can be based directly on their evolutionary histories, either in the form of taxonomic classification or well-supported phylogenetic trees (Faith 1992; Warwick and Clarke 1995; McPeek and Miller 1996; Crozier 1997; Helmus et al. 2007; Webb 2000; Webb et al. 2002; Pavoine et al. 2010; Ives and Helmus 2010, 2011; Vellend et al. 2011; Cavender-Bares et al. 2009, 2012 among others). Three special issues in *Ecology* were devoted to integrating ecology and phylogenetics; see McPeek and Miller (1996), Webb et al. (2006), and Cavender-Bares et al. (2012) and papers in each issue. Phylogenetic diversity measures are especially relevant for conservation applications, since they quantify the amount of evolutionary history preserved by the assemblage; see Lean and MacLaurin (chapter “The Value of Phylogenetic Diversity”).

The most widely used phylogenetic metric is Faith’s phylogenetic diversity (PD) (Faith 1992) which is defined as the sum of the branch lengths of a phylogenetic tree connecting all species in the target assemblage. As shown in Chao et al. (2010), Faith’s PD can be regarded as a phylogenetic generalization of species richness. The rarefaction formula for Faith’s PD was developed by Nipperess and Matsen (2013) and Nipperess (chapter “The Rarefaction of Phylogenetic Diversity: Formulation, Extension and Application”). Recently, Chao et al. (2015) derived an integrated sampling, rarefaction, and extrapolation methodology to compare Faith’s PD of a set of assemblages. Like species richness, Faith’s PD does not consider species abundances. For some conservation applications, the mere presence or absence of a species is all that matters, or all that can be determined from the available data. In those cases, Faith’s PD is a good measure of phylogenetic diversity. However, there are important advantages to incorporating abundance information into phylogenetic diversity measures for conservation. For example, some human impacts can result in the phylogenetic simplification of an ecosystem, reducing the population shares of phylogenetically distinct species relative to typical species. An abundance-based measure can catch this effect before it leads to actual extinctions.

Ecosystem simplification may be worthy of conservation concern even if it does not lead to extinctions of focal organisms. Often, the focal organisms for conservation represent a tiny fraction of the ecosystem’s biomass or richness. Each focal species will be tied to a web of non-focal species whose abundances are not usually monitored (e.g., insects). All else being equal, a more equitable distribution of the abundances of focal organisms will be able to support a more diverse, robust and stable set of non-focal species. Faith (chapter “Using Phylogenetic Dissimilarities Among Sites for Biodiversity Assessments and Conservation”) rightly argues that phylogenetic diversity is a good proxy for functional diversity. Therefore an ecosystem with a more equitable distribution of abundance across phylogenetic lineages should also exhibit greater functional complexity (per interaction between individuals) than an ecosystem whose phylogenetically unusual elements are rare. If we have to prioritize such ecosystems, the more phylogenetically equitable one, which thoroughly integrates diverse lineages, should be preferred. In addition to being more resistant to lineage extinctions, a complex, well-integrated ecosystem may be worth preserving in and of itself, above and beyond its component species; conservation is not just about species. Evolution may take a different course in ecosystems whose members are constantly surprised by their interactions compared with an ecosystem whose interactors are highly predictable. These conservation goals – robustness against extinction of distinctive lineages, and preservation of well-integrated ecosystems with unique future option values – require phylogenetic diversity measures that incorporate species importance values.

Rao’s quadratic entropy*Q* (Rao 1982), a generalization of the Gini-Simpson index, was the first diversity measure that accounts for both phylogeny and species abundances. The phylogenetic entropy *H*_{P} (Allen et al. 2009) extends Shannon entropy to incorporate phylogenetic distances among species. Since Shannon entropy and the Gini-Simpson index do not obey the replication principle, neither do their phylogenetic generalizations. These generalizations will therefore have the same interpretational problems as their parent measures; see Chao et al. (2010, their Supplementary Material) for examples.

Chao et al. (2010) extended Hill numbers and related similarity measures to incorporate phylogeny. The new phylogenetic Hill numbers obey a generalized replication principle. Their measures were subsequently extended by Faith and Richards (2012) and Faith (2013). Both the original Hill numbers and their phylogenetic generalizations facilitate diversitydecomposition (Jost 2007; Chiu et al. 2014). As with the original Hill numbers, both additive and multiplicative decompositions of phylogenetic Hill numbers lead to the same classes of similarity (or differentiation) measures. Hill numbers therefore provide a unified framework to quantify both abundance-based and phylogenetic diversity.

In this chapter, we first briefly review the classic abundance-based species diversity measures (section “Generalized Entropies”) and their phylogenetic generalizations (section “Phylogenetic generalized entropies”) for an assemblage. Then we focus on the framework of Hill numbers (section “Hill numbers and the replication principle”), phylogenetic Hill numbers (section “Phylogenetic Hill numbers and related measures”) and related phylogenetic diversity measures. We also discuss the replication principle and its phylogenetic generalization (section “Replication principle for phylogenetic diversity measures”). For multiple assemblages, we review the diversity decomposition based on phylogenetic diversity measures (section “Decomposition of phylogenetic diversity measures”). The associated phylogenetic similarity and differentiation measures are then presented (section “Normalized phylogenetic similarity measures”). We use a real example for illustration (section “An example”). Our practical recommendations are provided in section “Conclusion”.

## Classic Measures and Their Phylogenetic Generalizations

### Generalized Entropies

The species richness of an assemblage is a simple count of the number of species present. It is the most intuitive and frequently used measure of biodiversity, and is a key metric in conservation biology (MacArthur and Wilson 1967; Hubbell 2001; Magurran 2004). However, it does not incorporate any information about the abundances of species, and it is a very hard number to estimate accurately from small samples (Colwell and Coddington 1994; Chao 2005; Gotelli and Colwell 2011).

*S*is the number of species in the assemblage, and the

*i*th species has relative abundance

*p*

_{i}. Shannon entropy gives the uncertainty in the species identity of a randomly chosen individual in the assemblage. Another popular measure is the Gini-Simpson index,

*generalized entropy*:

*q*determines the sensitivity of the measure to the relative frequencies of the species. When

*q*= 0,

^{q}

*H*becomes

*S*− 1; When

*q*tends to 1,

^{q}

*H*tends to Shannon entropy. When

*q*= 2,

^{q}

*H*reduces to the Gini-Simpson index. This family was found many times in different disciplines (Havrdra and Charvat 1967; Daróczy 1970; Patil and Taillie 1979; Tsallis 1988; Keylock 2005). There are many other families of generalized entropies, notably the Rényi entropies (Rényi 1961).

Although the traditional abundance-sensitive generalized entropies and their special cases have been useful in many disciplines (e.g., see Magurran 2004), they do not behave in the same intuitive linear way as species richness. In ecosystems with high diversity, mass extinctions hardly affect their values (Jost 2010). They also lead to logical contradictions in conservation biology, because they do not measure a conserved quantity (e.g., under a given conservation plan, the proportion of “diversity” lost and the proportion preserved can *both* be 90 % or more); see Jost (2006, 2007) and Jost et al. (2010). Thus, changes in their magnitude cannot be properly compared or interpreted. Also, the main measure of similarity in the additive approach for traditional measures, the within-group or “alpha” diversity divided by the total or “gamma” diversity, does not actually quantify the compositional similarity of the assemblages under study. This ratio can be arbitrarily close to unity (supposedly indicating high similarity) even when the assemblages being compared have no species in common. Finally, these measures each use different units (e.g., the Gini-Simpson index is a probability whereas Shannon entropy is in units of information), so they cannot be compared with each other. All these problems are consequences of their failure to satisfy the replication principle. Hill numbers obey the replication principle and resolve all these problems; see section “Hill numbers and the replication principle”.

### Phylogenetic Generalized Entropies

The classic measures reviewed in section “Generalized Entropies” were extended to incorporate phylogenetic distance between species. As mentioned in the Introduction and will be shown in section “Phylogenetic Hill numbers and related measures”, Faith’s PD can be regarded as a phylogenetic generalization of species richness.

*quadratic*

*entropy*takes account of both phylogeny and species abundances (Rao 1982):

*d*

_{ij}denotes the phylogenetic distance (in years since divergence, number of DNA base changes, or other metric) between species

*i*and

*j*, and

*p*

_{i}and

*p*

_{j}denote the relative abundance of species

*i*and

*j*. This index measures the average phylogenetic distance between any two individuals randomly selected from the assemblage. Rao’s

*Q*represents a phylogenetic generalization of the Gini-Simpson index because in the special case of no phylogenetic structure (all species are equally related to one another),

*d*

_{ii}= 0 and

*d*

_{ij}= 1 (

*i ≠ j*), it reduces to the Gini-Simpson index.

*phylogenetic*

*entropy*

*H*

_{P}is a generalization of Shannon’s entropy to incorporate phylogenetic distances among species (Allen et al. 2009):

*L*

_{i}is the length of branch

*i*, and

*a*

_{i}denotes the summed relative abundance of all species descended from branch

*i*.

*H*

_{P}, and Rao’s

*Q*can be united into a single parametric family of

*phylogenetic generalized entropies*(Pavoine et al. 2009):

*L*

_{i}and

*a*

_{i}are defined in Eq. (2b) and

*T*is the age of the root node of the tree. Then

^{0}

*I*= Faith’s PD minus

*T*;

^{1}

*I*is identical to Allen et al.’s entropy

*H*

_{P}given in Eq. (2b); and

^{2}

*I*is identical to Rao’s quadratic entropy

*Q*given in Eq. (2a). In the special case that

*T*= 1 (the tree height is normalized to unit length) and all branches have unit length, then the phylogenetic generalized entropy reduces to the classical generalized entropy defined in Eq. (1c), with species relative abundances {

*p*

_{1},

*p*

_{2}, …,

*p*

_{S}} as the tip-node abundances.

The abundance-sensitive (*q* > 0) phylogenetic generalized entropies provide useful information, but they do not obey the replication principle and thus have the same interpretational problems as their parent measures. This motivated Chao et al. (2010) to extend Hill numbers to phylogenetic Hill numbers, which obey the replication principle; see section “Phylogenetic Hill numbers and related measures”.

## Hill Numbers and Their Phylogenetic Generalizations

### Hill Numbers and the Replication Principle

*H*

_{GS}). Hill (1973) integrated species richness and the converted Shannon and Gini-Simpson measures into a class of diversity measures called “

*Hill number*

*s*” of order

*q*, or the “

*effective number of species*”, defined as

*q*= 1, but its limit as

*q*tends to 1 exists and gives

*q*(

*q*≠ 1) and the generalized entropy can be expressed as

When *q* = 0, the species abundances do not count at all and ^{0}*D* = *S* is obtained. When *q* = 1, the species are weighed in proportion to their frequencies, and the measure ^{1}*D* (in Eq. (3b)) can be interpreted as the effective number of common or “typical” species (i.e., species with typical abundances) in the assemblage. When *q* = 2, abundant species are favored and rare species are discounted; the measure ^{2}*D* becomes the inverse Simpson concentration. The measure ^{2}*D* can be interpreted as the effective number of dominant or very abundant species in the assemblage. In general, if ^{q}*D* = *x*, then the diversity of order *q* of this community is the same as that of an idealized reference community with *x* equally abundant species. All Hill numbers are in units of “species”. It is thus possible to plot them on a single graph as a continuous function of the parameter*q*. This *diversity profile* characterizes the species-abundance distribution of an assemblage and provides complete information about its diversity. The steepness of its slope graphically illustrates the degree of dominance in the assemblage. An example is given in section “An example”.

Hill numbers differ fundamentally from Shannon entropy and the Gini-Simpson index in that they obey the replication principle. Hill (1973) proved a weak version of the doubling property: if two completely distinct assemblages (i.e., no species in common) have identical relative abundance distributions, then the Hill number doubles if the assemblages are combined with equal weights. Chiu et al. (2014, their Appendix B) recently proved a strong version of the doubling property: if two completely distinct assemblages have identical Hill numbers of order *q* (relative abundance distributions may be different, unlike the weak version), then the Hill number of the same order doubles if the two assemblages are combined with equal weights. Species richness is a Hill number (with *q* = 0) and obeys both versions of the doubling property, but most other diversity indices do not obey even the weak version. Because Hill numbers obey this replication principle, changes in their magnitude have simple interpretations, and the ratio of alpha diversity to gamma diversity accurately reflects the compositional similarity of the communities. The replication principle is best known in economics, where it has long been recognized as an important property of concentration and diversity measures (Hannah and Kay 1977). In ecology, the doubling property has been extensively discussed by many authors (MacArthur 1965, 1972; Hill 1973; Whittaker 1972; Routledge 1979; Peet 1974; Jost 2006, 2007, 2009; Ricotta and Szeidl 2009; Jost et al. 2010) and has been extended to phylogenetic measures (Chao et al. 2010); see below.

### Phylogenetic Hill Numbers and Related Measures

*t*in the past; see the left panel of Fig. 1 (reproduced from Chao et al. 2010) for illustration and details about how to deal with shared lineages. The number of lineages at that time is the number of branch cuts, and the relative importance of each of these lineages for the present-day assemblage is the sum of the relative abundances of the branch’s descendants in the present-day assemblage. Using these relative importance values, we can calculate the generalized entropy of order

*q*for the slice. The mean of these entropies, beginning at time –

*T*(i.e.,

*T*years before present) and continuing until the present, is converted to a Hill number using Eq. (3c). This is the

*phylogenetic Hill number*, which conveys information about the shape of the tree over the time interval of interest. Chao et al. (2010) symbolize it as \( {}{}^q\overline{D}\;(T) \), and also refer to it as the

*mean phylogenetic diversity*

*of order q*over

*T*years (or simply the

*mean diversity*for the interval [−

*T*, 0]):

**B**_{T}is the set of all branches in the time interval [−

*T*, 0],

*L*

_{i}is the length of branch

*i*in the set

**B**_{T}, and

*a*

_{i}is the total relative abundance descended from branch

*i*. The mean diversity\( {}{}^q\overline{D}\;(T) \) is interpreted as “the effective number of equally abundant and

*equally distinct*lineages all with branch lengths

*T*during the time interval from

*T*years ago to the present”. Here “equally distinct” also implies that the phylogenetic distance between any two species is

*T*, so lineages are completely distinct (i.e., there are no shared branches).

The phylogenetic Hill numbers are invariant to the units used to measure branch lengths. When all lineages are completely distinct, the measure \( {}{}^q\overline{D}\;(T) \) reduces to the Hill numbers \( {}^qD={\left({\displaystyle \sum}_i{a}_i^q\right)}^{1/\left(1-q\right)} \). This includes the special case that *T* tends to zero, i.e., the case that we ignore phylogeny and only consider the present-day community. This shows that the framework based on Hill numbers provides a unified approach to integrate abundances and phylogeny. Also, here we have a simple idealized reference tree to understand the value of \( {}{}^q\overline{D}\;(T)=z \) for an arbitrary tree: the mean phylogenetic diversity of the tree over the time period [−*T*, 0] is the same as the diversity of an idealized assemblage consisting of *z* equally abundant and equally distinct lineages all with branch length *T.*

*q*= 0, when

*T*is chosen as the age of the root node, we have \( {}{}^0\overline{D}\;(T)=\mathrm{Faith}'\mathrm{s}\;\mathrm{P}\mathrm{D}/T \), which can be interpreted as

*lineage*

*richness*. Faith’s PD can thus be regarded as a phylogenetic generalization of species richness. We can roughly interpret \( {}{}^1\overline{D}\;(T) \) as the effective number of

*common lineages*, and \( {}{}^2\overline{D}\;(T) \) as the effective number of

*dominant lineages*in the time period [−

*T*, 0]. When

*T*is chosen as the age of the root node, a simple relationship exists between phylogenetic entropy

*H*

_{P}(Allen et al. 2009) and the measure \( {}{}^1\overline{D}\;(T) \):

*q*= 2, when

*T*is chosen as the age of the root node, there is a simple relationship between our measures and the widely used Rao’s quadratic entropy

*Q*(Chao et al. 2010):

*branch or phylogenetic*

*diversity*

^{q}

*PD*(

*T*) of order

*q*during the time interval from

*T*years ago to the present is defined as the product of \( {}{}^q\overline{D}\;(T) \) and

*T*. It quantifies the amount of evolutionary history on the system over the interval [−

*T*, 0], or “the effective total branch-length” (Chao et al. 2010):

If *q* = 0, and *T* is age of the root node, then ^{0}*PD*(*T*) reduces to Faith’s PD, regardless of branching pattern or abundances. As explained by Chao et al. (2010), we could imagine that all the branch segments in the interval [−*T*, 0] form a single assemblage with relative abundance set {*a*_{i}/T; *i*∈**B**_{T}}. In this assemblage, for each *i* there are *L*_{i} “branches” with relative abundance *a*_{i}/*T*. Then the Hill number of order *q* for this assemblage is exactly the branch diversity^{q}*PD*(*T*) given in Eq. (5a). Dividing this Hill number by *T*, we obtain \( {}{}^q\overline{D}\;(T) \) given in Eq. (4a). Note in our framework that ^{q}*PD*(*T*) is truly a class of Hill numbers (“the effective number of lineage-years”), whereas \( {}{}^q\overline{D}\;(T) \) (“the effective number of lineages”) denotes a (generalized) mean of Hill numbers. See Faith and Richards (2012) and Faith (2013) for extensions of the measure ^{q}*PD*(*T*).

^{q}

*PD*(

*T*) depend explicitly on two parameters, the abundance sensitivity parameter

*q*and the time perspective (or time-depth) parameter

*T*. The reasons we need this time-depth parameter and our suggestion to choose a perspective time are given as follows.

- 1.
When we compare the phylogenetic diversities of several assemblages based on the measures \( {}{}^q\overline{D}\;(T) \) and

^{q}*PD*(*T*), all measures should refer to the same time periods to make meaningful comparisons. That is, the time-depth*T*should be kept as the same for all assemblages. Therefore, a parameter is required to specify the time-depth. - 2.
The choice of time perspective should reflect an investigator’s aims and facilitate comparisons with other studies. We suggest that at least two selected time perspectives should be included:

*T*= 0, and*T*= the age of the root node of a phylogenetic tree connecting all species in the study. For the case of*T*= 0, the phylogeny is ignored and the diversity profile reduces to the profile in the present-day assemblage based on the ordinary Hill numbers. If we choose*T*to be the age of the oldest node in the tree, we recover some of the standard measures of phylogenetic diversity (see Eqs. (4c) and (4d)). - 3.
As suggested in Chiu et al. (2014), other time perspectives can be selected, such as

*T*= the age of the node at which the group of interest diverges from the rest of the species. This choice of*T*is independent of the species actually sampled, so it allows statistically robust comparisons across investigations and regions (unlike the conventional choice of*T*as the root node of the tree containing the species actually observed). This choice also provides an accurate measure of the proportion of a taxonomic group’s evolutionary history preserved in a given assemblage. Another choice is the time of the most recent common ancestor of all taxa alive today. Other choices may be made, depending on the purpose of an investigation. The formula in Chiu et al. (2014, p. 42) can be used to convert phylogenetic diversity from one temporal perspective to another.

To see how the measures vary with *q* and time perspective *T*, we recommend using two types of profiles to completely characterize phylogenetic tree information and species abundances as described below. See section “An example” for examples. (1) The first type of diversity profile is obtained by plotting ^{q}*PD*(*T*) or \( {}{}^q\overline{D}\;(T) \) as a function of order *q* as *q* varies from 0 to about 3 or 4 (beyond which there is usually little change), for some selected values of temporal perspective *T.* For this type of profile, ^{q}*PD*(*T*) and \( {}{}^q\overline{D}\;(T) \) have similar patterns as *T* is fixed, so it is sufficient to plot the profile only for one measure. (2) The second type of diversity profile is obtained by plotting ^{q}*PD*(*T*) and \( {}{}^q\overline{D}\;(T) \) as functions of *T* separately for *q* = 0, 1, and 2. This profile shows the effect of time-depth or evolution change on our diversity measures.

For the second type of profile, ^{q}*PD*(*T*) and \( {}{}^q\overline{D}\;(T) \) generally exhibit different patterns (the profile of \( {}{}^q\overline{D}\;(T) \) is decreasing with *T* whereas the profile of ^{q}*PD*(*T*) for *q* = 0 (Faith’s PD) is always increasing, and for *q* > 0 is generally increasing up to a certain point, so the profiles for both measures are informative. The parameter*q* gives the sensitivity of the two measures to present-day species relative abundances. As in the ordinary Hill numbers, the measures with *q* = 2 favor more abundant species, so they are useful in ecological studies to examine the phylogenetic relationships of the dominant species in a set of assemblages, or those examining functional diversity. The measures of *q* = 0 emphasizes rare species, so they are useful when abundance information is not necessarily relevant (e.g., when ecologists try to identify past episodes of differentiation, or for some conservation biology applications). The measures with *q* = 1 weigh species according to their frequencies and can be used in most applications when neither dominant nor rare species should be favored.

When the measure of evolutionary change is typically based on the number of nucleotide base changes at a selected locus, or the amount of functional or morphological differentiation from a common ancestor, the branches of the resulting tree will then be uneven, so the tree is non-ultrametric. In this case, Chao et al. (2010) showed that the time parameter*T* in all formulas should be replaced by the *mean base change* or *mean**branch length*\( \overline{T}, \) the mean of the distances from the tree base to each of the terminal branch tips (i.e., the mean evolutionary change per species over the interval of interest). See the right panel of Fig. 1 for an illustrative example. Let \( {B}_{\overline{T}} \) denote the set of branches connecting all focal species, with mean branch length \( \overline{T}. \) Then we can express \( \overline{T} \) as \( \overline{T}={\displaystyle \sum}_{i\in {B}_{\overline{T}}}{L}_i{a}_i \). The diversity of a non-ultrametric tree with mean evolutionary change \( \overline{T} \) is the same as that of an ultrametric tree with time parameter \( \overline{T}. \) Therefore, the diversity formulas for a non-ultrametric tree are obtained by replacing *T* by \( \overline{T} \) in Eqs. (4a), (4b), (5a), and (5b). The resulting measures are denoted respectively as \( {}{}^q\overline{D}\;\left(\overline{T}\right) \), \( {}{}^1\overline{D}\;\left(\overline{T}\right) \), \( {}{}^qPD\;\left(\overline{T}\right) \) and \( {}{}^1PD\;\left(\overline{T}\right) \); see Chao et al. (2010) for details. When we compare the phylogenetic diversity based on the measures \( {}{}^q\overline{D}\;\left(\overline{T}\right) \) and \( {}{}^qPD\;\left(\overline{T}\right) \) for several non-ultrametric trees, all measures should refer to the same mean base change \( \overline{T} \) to make meaningful comparisons.

### Replication Principle for Phylogenetic Diversity Measures

The replication principle was generalized to a phylogenetic version in Chao et al. (2010). Suppose there are *N* equally large and completely phylogenetically distinct assemblages (no shared lineages across assemblages, though lineages within an assemblage may be shared); see Fig. 2 (reproduced from Chiu et al. 2014) for an illustrative example. Suppose these assemblages have the same phylogenetic Hill number*X*. If these assemblages are pooled, then the pooled assemblages must have a phylogenetic Hill number *N* × *X*. In the proof of this replication principle, Chao et al. (2010) assumed that these *N* assemblages have the same mean branch lengths. Here we relax this assumption and allow assemblages to have different mean branch lengths. (In the special case of ultrametric trees, this means that we allow different time perspectives for different assemblages.)

*k*, the mean branch length is \( {\overline{T}}_k \), and the branch set is \( {B}_{{\overline{T}}_k,k} \) (we omit \( {\overline{T}}_k \) in the subscript and just use

**B**_{k}in the following proof for notational simplicity) with branch lengths {

*L*

_{ik};

*i*∈

**B**_{k}} and the corresponding nodes abundances {

*a*

_{ik};

*i*∈

**B**_{k}},

*k*= 1, 2, …,

*N*. Assume that all assemblages have the same phylogenetic Hill numbers\( {}{}^q\overline{D}\;\left({\overline{T}}_k\right)=X, \) implying \( {\displaystyle \sum}_{i\in {B}_k}{L}_{ik}\kern0.24em {a}_{ik}^q={X}^{1-q}{\overline{T}}_k \) for all

*k*=1, 2, …,

*N*. When the

*N*trees are pooled with equal weight for each tree, each node abundance

*a*

_{ik}in the pooled tree becomes

*a*

_{ik}/

*N*, and the mean branch length becomes \( \overline{T}=\left(1/N\right){\displaystyle \sum}_{k=1}^N{\overline{T}}_k \). Then the phylogenetic Hill number of order

*q*for the pooled assemblage becomes

## Decomposition of Phylogenetic Diversity Measures

Decomposition of species richness and its phylogenetic analogues into within- and between-group (alpha and beta) components is widely used (Whittaker 1972; Faith et al. 2009). However, these take no notice of abundance differences between sites. Conservationists using these measures cannot distinguish a site whose species are equally abundant from a site with the same species but with a highly skewed abundance distribution whose most phylogenetically distinctive species are rare. The former site would be a better bet for conservation. These considerations, and others, motivate the development of decomposition theory for abundance-based phylogenetic diversity measures. The decomposition also leads to abundance-sensitive measures of phylogenetic similarity and complementarity.

When there are *N* assemblages, the phylogenetic Hill numbers\( {}{}^q\overline{D}\;(T) \) (Eqs. 4a and 4b) and phylogenetic diversity^{q}*PD*(*T*) (Eqs. 5a and 5b) of the pooled assemblage can be multiplicatively decomposed into independent alpha and beta components (Chiu et al. 2014). We briefly describe the decomposition of the measure \( {}{}^q\overline{D}\;(T) \) here for the ultrametric case, and only summarize the decomposition of the measure ^{q}*PD*(*T*). The extension to the non-ultrametric case for both measures is obtained by simply replacing all *T* in the formulas with the mean branch length\( \overline{T} \) of the pooled assemblage.

To begin the partitioning, a pooled tree is constructed for the *N* assemblages. Assume that there are *S* species in the present-day assemblage (i.e., there are *S* tip nodes). For any tip node *i*, let *z*_{ik} denote any measure of species importance of the *i*th species in the *k*th assemblage, *i* = 1, 2, …, *S*, *k* = 1, 2, …, *N*. The measure *z*_{ik} is referred to as “abundance” for simplicity, although it can be absolute abundances, relative abundances, incidence, biomasses, cover areas or any other importance measure. Define \( {z}_{+k}={\displaystyle \sum}_{i=1}^S{z}_{ik} \) (i.e., the “+” sign in *z*_{+k} denotes a sum over the tip nodes only) as the current size of the *k*th assemblage. Let \( {z}_{++}={\displaystyle \sum}_{k=1}^N{z}_{+k} \) be the total abundance in the present-day pooled assemblage.

Now consider the phylogenetic tree in the time interval [−*T*, 0], and in the pooled assemblage define **B**_{T} and *L*_{i} as in section “Phylogenetic Hill numbers and related measures”. We extend the definition of *z*_{ik} to include all nodes and their corresponding branches by defining *z*_{ik} for all *i*∈**B**_{T} as the total abundances descended from branch *i*. (Here the index *i* can correspond to both tip-node and internal node; if *i* is a tip-node, then *z*_{ik} represents data of the current assemblage as defined in the preceding paragraph.) As shown in Fig. 2 of Chiu et al. (2014), the diversity for each individual assemblage can be computed from the pooled tree structure, and only the node abundances vary with assemblages.

*i*(

*i*∈

**B**_{T}) is \( {z}_{i+}={\displaystyle \sum}_{k=1}^N{z}_{ik} \) with branch relative abundance

*z*

_{i+}/

*z*

_{++}, so the phylogenetic gamma diversity of order

*q*can be calculated from Eq. (4a) as

*q*approaches unity exists and is equal to

*T*in the pooled assemblage.

*q*≥ 0 and

*q*≠ 1:

*q*= 1, we have

*T*in an individual assemblage. When normalized measures of species importance (like relative abundance or relative biomass) are used to quantify species importance, we have

*z*

_{++}=

*N*in Eqs. (8a) and (8b). The alpha formula then reduces to a generalized mean of the local diversities with the following property: if all assemblages have the same diversity

*X*, the alpha diversity is also

*X*(Jost 2007). For non-normalized measures of species importance, like absolute abundance or biomass, this property does not hold. This is because when species absolute abundances are compared, for example, a three-species assemblage with absolute abundances {2, 5, 8} will not be treated as identical as another three-species assemblage with absolute abundances {200, 500, 800}. However, these two assemblages are treated as identical when only relative abundances are compared.

*all q*≥ 0 regardless of species abundances and tree structures. Based on a multiplicative partitioning, the phylogenetic beta diversity is the ratio of gamma diversity to alpha diversity:

When the *N* assemblages are identical in species identities and species abundances, then \( {}{}^q\overline{D}_{\beta }(T)=1 \) for any *T*. When the *N* assemblages are completely phylogenetically distinct (no shared lineages), then \( {}{}^q\overline{D}_{\beta }(T)=N, \) no matter what the diversities or tree shapes of the assemblages. The measure \( {}{}^q\overline{D}_{\beta }(T) \) thus quantifies the effective number of completely phylogenetically distinct assemblages in the interval [−*T*, 0]. As proved by Chiu et al. (2014), the phylogenetic beta diversity\( {}{}^q\overline{D}_{\beta }(T) \) is always between unity and *N* for any given alpha value, implying alpha and beta components are unrelated (or independent) for both measures, \( {}{}^q\overline{D}\;(T) \) and ^{q}*PD*(*T*); see Chao et al. (2012) for a rigorous discussion of un-relatedness and independence of two measures. When all lineages in the pooled assemblage are completely distinct (no lineages shared) in the interval [−*T*, 0], the phylogenetic alpha, beta and gamma Hill numbers reduce to those based on ordinary Hill numbers. This includes the limiting case in which *T* tends to zero, so that phylogeny is ignored.

Parallel decomposition can be made for the phylogenetic diversity^{q}*PD*(*T*), and we summarize the following relations: \( {}{}^qP{D}_{\gamma }(T)={}{}^q\overline{D}_{\gamma }(T)\times T \) and \( {}{}^qP{D}_{\alpha }(T)={}{}^q\overline{D}_{\alpha }(T)\times T. \) Under a multiplicative partitioning scheme, we have \( {}{}^qP{D}_{\beta }(T)={}{}^qP{D}_{\gamma }(T)/{}{}^qP{D}_{\alpha }(T)={}{}^q\overline{D}_{\beta }(T) \), i.e., the beta components from partitioning the phylogenetic Hill numbers\( {}{}^q\overline{D}\;(T) \) and phylogenetic diversity ^{q}*PD*(*T*) are identical, implying the interpretation and the corresponding similarity or differentiation measures (in the next section) are also identical. Thus, it is sufficient to focus only on the measure \( {}{}^q\overline{D}_{\beta }(T) \), which will be referred to as the *phylogenetic**beta diversity* or beta component for simplicity.

For each of the two measures, \( {}{}^q\overline{D}\;(T) \) and ^{q}*PD*(*T*), alpha and gamma diversities obey the replication principle. Then the beta diversity formed by taking their ratio is replication-invariant (Chiu et al. 2014). That is, when assemblages are replicated, the beta diversity does not change. Therefore, when we pool equally-distinct sub-trees, such as pooling equally-ancient subfamilies, the beta diversity is unchanged by pooling the subfamilies if all subfamilies show the same beta diversity (“consistency in aggregation”).

*q*= 0, 1 and 2.

- (a)
When

*q*= 0, we have \( {}{}^0\overline{D}_{\beta }(T)={L}_{\gamma }(T)/{L}_{\alpha }(T) \), where*L*_{γ}(*T*) denotes the total branch length of the pooled tree (the gamma component of Faith’s PD) and*L*_{α}(*T*) denotes the average length of individual trees (the alpha component of Faith’s PD). - (b)When
*q*= 1, the phylogenetic beta diversity of order 1 iswhere$$ {}{}^1\overline{D}_{\beta }(T)= \exp \left[\left({H}_{P,\gamma }-{H}_{P,\alpha}\right)/T+{\displaystyle \sum}_{k=1}^N\left(\frac{z_{+k}}{z_{++}}\right) \log \left(\frac{z_{+k}}{z_{++}}\right)+ \log N\right], $$(10a)*H*_{P,γ}and*H*_{P,α}denote respectively the gamma and alpha phylogenetic entropy. When the species importance measure*z*_{ik}represents the*i*th species relative abundance in the*k*th current-time assemblage, then \( {z}_{+k}=1,\kern0.24em {z}_{++}=N,\kern0.24em {z}_{+k}/{z}_{++}=1/N. \) In this special case, we have \( {}{}^1\overline{D}_{\beta }(T)= \exp \left[\left({H}_{P,\gamma }-{H}_{P,\alpha}\right)/T\right] \). Thus an additive decomposition for phylogenetic entropy*H*_{P}holds (Pavoine et al. 2009; Mouchet and Mouillot 2011), as for ordinary Shannon entropy (Jost 2007). - (c)When
*q*= 2, the phylogenetic beta diversity can be expressed as$$ {}{}^2\overline{D}_{\beta }(T)=\frac{{\displaystyle \sum}_{i\in {B}_T}{L}_i{\displaystyle \sum}_{k=1}^N{z}_{ik}^2}{{\displaystyle \sum_{i\in {B}_T}^N{L}_i{z}_{i+}^2}}\;. $$

*Q*

_{γ}and

*Q*

_{α}denote respectively the gamma and alpha quadratic entropy. The above formula is also applicable to non-ultrametric trees by replacing all

*T*with \( \overline{T} \), the mean branch length in the pooled assemblage; see Chiu et al. (2014, Appendix C) for a proof.

## Normalized Phylogenetic Similarity Measures

For traditional abundance-based diversity, the most commonly used similarity measures include *N*-assemblage generalizations of the Jaccard et al. (1966) and Morisita-Horn (Morisita 1959) measures. The latter three measures were integrated into a class of *C*_{qN} measures by Chao et al. (2008). Jost (2006, 2007), Chao et al. (2008, 2012), and Chiu et al. (2014) have demonstrated that all the above measures are monotonic transformations of beta diversity based on the ordinary Hill numbers. This is an advantage of using the framework of Hill numbers: a direct link exists between diversity and similarity (or differentiation) among assemblages.

*N*] for any measures of species importance and all orders

*q ≥*0. Since the range depends on

*N*, the phylogenetic beta diversity cannot be used to compare phylogenetic differentiation among assemblages across multiple regions with different numbers of assemblages. To remove the dependence on

*N*, several transformations can be used to transform the phylogenetic beta component onto [0, 1] to measure local overlap, regional overlap, homogeneity and turnover. We give a summary of these four transformations below and tabulate formulas and the relationship with previous measures in Table 1 for the two most important classes. The formulas for the special cases for

*q*= 0, 1 and 2 are also displayed there.

- 1.
*A class of branch overlap measures from a local perspective*:This gives the effective average proportion of shared branches in an individual assemblage. This class of similarity measures extends the$$ {\overline{C}}_{qN}(T)=\frac{N^{1-q}-{\left[{}{}^q\overline{D}_{\beta }(T)\right]}^{1-q}}{N^{1-q}-1}. $$(11a)*C*_{qN}overlap measure derived in Chao et al. (2008) to a phylogenetic version. The corresponding differentiation measure \( 1-{\overline{C}}_{qN}(T) \) quantifies the effective average proportion of non-shared branches in an individual assemblage.- (1a)
For

*q*= 0, this similarity measure is referred to as the “phylo-Sørensen”*N*-assemblage overlap measure because for*N*= 2, it reduces to the measure*PhyloSør*(phylo-Sørensen) developed by Bryant et al. (2008) and Ferrier et al. (2007). - (1b)
For

*q*= 1, this measure \( {\overline{C}}_{1N}(T) \) is called the “phylo-Horn”*N*-assemblage overlap measure because it extends Horn (1966) two-assemblage measure to incorporate phylogenies for*N*assemblages. - (1c)
For

*q*= 2, \( {\overline{C}}_{2N}(T) \) is called the “phylo-Morisita-Horn”*N*-assemblage similarity measure because it extends Morisita-Horn measure (Morisita 1959) to incorporate phylogenies for*N*assemblages. The differentiation measure \( 1-{\overline{C}}_{2N}(T) \) when the species importance measure is relative abundances reduces to the measure proposed by de Bello et al. (2010). However, their measure is valid only for ultrametric trees (p. 7 of de Bello et al. 2010). Here, the measure can be applied to non-ultrametric trees to obtain

where$$ 1-{\overline{C}}_{2N}\left(\overline{T}\right)=\frac{1-\left[1/{}{}^2\overline{D}_{\beta}\left(\overline{T}\right)\right]}{1-1/N}=\frac{Q_{\gamma }-{Q}_{\alpha }}{\left(1-1/N\right)\left(\overline{T}-{Q}_{\alpha}\right)}, $$(11b)*Q*_{γ}and*Q*_{α}are respectively gamma and alpha quadratic entropy, and \( \overline{T} \) is the mean branch length in the pooled assemblage. A general form for any species importance measure (including absolute abundances) is$$ 1-{\overline{C}}_{2N}\left(\overline{T}\right)=\frac{{\displaystyle \sum}_{i\in {B}_{\overline{T}}}{L}_i{\displaystyle \sum}_{m>k}^N{\left({z}_{im}-{z}_{ik}\right)}^2}{\left(N-1\right){\displaystyle \sum}_{i\in {B}_{\overline{T}}}{L}_i{\displaystyle \sum}_{k=1}^N{z}_{ik}^2}\;. $$(11c)The above expression shows that the similarity index \( {\overline{C}}_{2N}\left(\overline{T}\right) \), as in all other abundance-sensitive similarity measures, is unity if and only if \( {z}_{ij}={z}_{ik} \) (i.e., species importance measures are identical for any node

*i*in the branch set and for any two assemblages*j*and*k*). This reveals that the similarity index \( {\overline{C}}_{2N}\left(\overline{T}\right) \) quantifies the node-by-node resemblance among the*N*abundance sets {*z*_{ik};*i*∈**B**_{T̅}}*, k*= 1, 2, …,*N*from a local perspective. See Fig. 2 of Chiu et al. (2014) for a simple example of the framework. - (1a)
- 2.
*A class of branch overlap measures from a**regional**perspective*:$$ {\overline{U}}_{qN}(T)=\frac{{\left[1/{}{}^q\overline{D}_{\beta }(T)\right]}^{1-q}-{\left(1/N\right)}^{1-q}}{1-{\left(1/N\right)}^{1-q}} $$(12a)This class of measures quantifies the effective proportion of shared branches in the pooled assemblage. The corresponding differentiation measure \( 1-{\overline{U}}_{qN}(T) \) quantifies the effective average proportion of non-shared branches in the pooled assemblage.- (2a)
For

*q*= 0, this measure is called the “phylo-Jaccard”*N*-assemblage measure because for*N*= 2 the measure \( 1-{\overline{U}}_{02}(T) \) reduces to the Jaccard-type*UniFrac*measure developed by Lozupone and Knight (2005) and the PD-dissimilarity measure developed by Faith et al. (2009). - (2b)
For

*q*= 1, this measure is identical to the “phylo-Horn”*N*-assemblage overlap measure \( {\overline{C}}_{1N}(T) \); see Table 1. - (2c)For
*q*= 2, we refer to the measure*U̅*_{2N(T)}as a “phylo-regional-overlap” measure. When the species importance measure is relative abundance, we have the following formula for non-ultrametric trees:where \( \overline{T} \) denotes the mean branch length in the pooled assemblage. A general form for any species importance measure (including absolute abundances) is$$ 1-{\overline{U}}_{2N}\left(\overline{T}\right)=\frac{N-{}{}^2\overline{D}_{\beta}\left(\overline{T}\right)}{N-1}=\frac{Q_{\gamma }-{Q}_{\alpha }}{\left(N-1\right)\left(\overline{T}-{Q}_{\gamma}\right)}, $$$$ 1-{\overline{U}}_{2N}\left(\overline{T}\right)=\frac{{\displaystyle \sum}_{i\in {B}_{\overline{T}}}{L}_i{\displaystyle \sum}_{m>k}^N{\left({z}_{im}-{z}_{ik}\right)}^2}{\left(N-1\right){\displaystyle \sum}_{i\in {B}_{\overline{T}}}{L}_i{z}_{i+}^2}\;. $$

The numerator is the same as that in \( {\overline{C}}_{2N}\left(\overline{T}\right) \), revealing that the similarity index \( {\overline{U}}_{2N}\left(\overline{T}\right) \) also quantifies the node-by-node resemblance among the

*N*abundance sets {*z*_{ik};*i*∈**B**_{T̅}}*, k*= 1, 2, …,*N*; but here the denominator (for the purpose of normalization) is different and takes a regional perspective. - (2a)
- 3.
*A class of phylogenetic homogeneity measures*$$ {\overline{S}}_{qN}(T)=\frac{1/{}{}^q\overline{D}_{\beta }(T)-1/N}{1-1/N}. $$(12b)This measure is linear in the proportion of regional phylogenetic diversity contained in a typical assemblage.- (3a)
For

*q*= 0, it reduces to the “phylo-Jaccard” measure*U̅*_{0N (T)}, i.e., \( {\overline{S}}_{0N}(T)={\overline{U}}_{0N}(T) \). - (3b)
For

*q*= 1, this measure does not reduce to the “phylo-Horn” overlap measure. - (3c)
For

*q*= 2, this measure is identical to \( {\overline{C}}_{2N}(T) \), the “phylo-Morisita-Horn” similarity measure, i.e., \( {\overline{S}}_{2N}(T)={\overline{C}}_{2N}(T). \)

- (3a)
- 4.
*A class of measures of the complement of “phylogenetic turnover rate”:*This measure in linear in the phylogenetic beta diversity and the corresponding differentiation measure \( \left[{}{}^q\overline{D}_{\beta }(T)-1\right]/\left(N-1\right) \) quantifies the relative branch turnover rate per assemblage.$$ {\overline{V}}_{qN}(T)=\frac{N-{}{}^q\overline{D}_{\beta }(T)}{N-1}=1-\frac{{}{}^q\overline{D}_{\beta }(T)-1}{N-1}. $$(12c)- (4a)
For

*q*= 0, the measure \( {\overline{V}}_{0N}(T) \) is identical to the “phylo-Sørensen” measure, i.e., \( {\overline{V}}_{0N}(T)={\overline{C}}_{0N}(T) \). - (4b)
For

*q*= 1, this measure does not reduce to the “phylo-Horn” overlap measure. - (4c)
For

*q*= 2, this measure is identical to*U̅*_{2N (T)}, the “phylo-regional-overlap” measure. That is, \( {\overline{V}}_{2N}(T)={\overline{U}}_{2N}(T) \).

- (4a)

As with the phylogenetic diversity measures, all the above similarity or differentiation measures are functions of two parameters: the sensitivity parameter*q* and the time perspective *T*. Thus, for each measure, we suggest using the two types of profiles described in section “Phylogenetic Hill numbers and related measures” for the two major similarity measures \( {\overline{C}}_{qN}(T) \) and *U̅*_{qN}(*T*) (or their complements) to convey complete information about the similarity or differentiation of a set of assemblages. An example showing the two types of profiles is given in section “An example”.

*N*-assemblage phylogenetic generalizations of the classic Jaccard, Sørensen, Horn and Morisita-Horn measures, regardless of whether one prefers multiplicative or additive decompositions.

Two major classes of phylogenetic similarity measures based on the transformations of phylogenetic beta diversity when species importance measures are incidences (for *q* = 0), relative abundances or absolute abundance (for *q* = 1 and 2). The corresponding differentiation measures are the one-complements of the similarity measures. When all lineages are completely distinct (this includes \( T\to 0 \), ignoring phylogeny), these phylogenetic measures reduce to the corresponding non-phylogenetic versions. All measures can also be applied to non-ultrametric trees if \( \overline{T} \) is substituted for *T*

Order | Species importance measure | Phylo-local-overlap\( {\overline{C}}_{qN}(T)=\frac{{\left[1/{}{}^q\overline{D}_{\beta }(T)\right]}^{q-1}-{\left(1/N\right)}^{q-1}}{1-{\left(1/N\right)}^{q-1}} \) | Phylo-regional-overlap\( {\overline{U}}_{qN}(T)=\frac{{\left[1/{}{}^q\overline{D}_{\beta }(T)\right]}^{1-q}-{\left(1/N\right)}^{1-q}}{1-{\left(1/N\right)}^{1-q}} \) |
---|---|---|---|

| Incidences | Phylo-Sørensen (= | Phylo-Jaccard (=1– |

| Relative abundances | Phylo-Horn\( 1-\frac{H_{P,\gamma }-{H}_{P,\alpha }}{T \log N} \) | Phylo-Horn\( 1-\frac{H_{P,\gamma }-{H}_{P,\alpha }}{T \log N} \) |

Absolute abundances | \( \frac{H_{P,\alpha }-{H}_{P,\gamma }-T{\displaystyle \sum}_{k=1}^N\frac{z_{+k}}{z_{++}} \log \left(\frac{z_{+k}}{z_{++}}\right)}{T \log N} \) | \( \frac{H_{P,\alpha }-{H}_{P,\gamma }-T{\displaystyle \sum}_{k=1}^N\frac{z_{+k}}{z_{++}} \log \left(\frac{z_{+k}}{z_{++}}\right)}{T \log N} \) | |

| Relative abundances | Phylo-Morisita-Horn\( 1-\frac{Q_{\gamma }-{Q}_{\alpha }}{\left(1-1/N\right)\left(T-{Q}_{\alpha}\right)} \) | Phylo-regional-overlap\( 1-\frac{Q_{\gamma }-{Q}_{\alpha }}{\left(N-1\right)\left(T-{Q}_{\gamma}\right)} \) |

Absolute abundances | \( 1-\frac{{\displaystyle \sum}_{i\in {B}_T}{L}_i{\displaystyle \sum}_{m>k}^N{\left({z}_{im}-{z}_{ik}\right)}^2}{\left(N-1\right){\displaystyle \sum}_{\;i\in {B}_T}{L}_i{\displaystyle \sum}_{k=1}^N{z}_{ik}^2} \) | \( 1-\frac{{\displaystyle \sum}_{i\in {B}_T}{L}_i{\displaystyle \sum}_{m>k}^N{\left({z}_{im}-{z}_{ik}\right)}^2}{\left(N-1\right){\displaystyle \sum}_{i\in {B}_T}{L}_i{z}_{i+}^2} \) |

## An Example

*Sebastes*collected for 20 years over three decades (1980–1986, 1993–1994, 1996, 1998–2007) from the Southern California Bight, USA. The phylogenetic tree for these 52 species was obtained from Hyde and Vetter (2007); see Fig. 3a. The age of the root for these species is around 7.9 million years (Myr).

We separate the data into three decades: 1980s, 1990s and 2000s, which will be referred to as Assemblages (and Decades) I, II and III respectively. Within each decade’s assemblage, species abundances are pooled. The species relative abundances for the three assemblages are shown in Fig. 3a. There were 48, 44 and 39 species in Decades I, II and III, respectively. (Note that each data point here is a mean of many years’ observations.) A sub-tree containing only the six dominant species (those with relative abundance >8 % in at least one assemblage) is shown in Fig. 3b. All six species are shared in the three assemblages and four of them have been in isolated lineages for 6 Myr.

*q*, 0 ≤

*q*≤ 3, for two selected values of temporal perspectives:

*T*= 0 (phylogeny is ignored) and

*T*= 7.9 Myr (whole phylogenetic tree in Fig. 3a is considered). In Fig. 4b, we plot \( {}{}^q\overline{D}\;(T) \) and

^{q}

*PD*(

*T*) as functions of

*T*separately for

*q*= 0, 1, and 2 for 0 ≤

*T*≤ 10.

Based on our phylogenetic diversity measures, all profiles in Fig. 4 reveal that the diversity in the most recent decade (Decade III) is the lowest among the three decades in the rockfish assemblage. This implies an appreciable loss of species (as shown in the first type of profile for *T* = 0), loss of lineages (as shown in the second type of profile based on the measure \( {}{}^q\overline{D}\;(T) \)), and loss of evolutionary history (as shown in the second type of profiles based on the measure ^{q}*PD*(*T*)) over the three decades.

When species/lineage abundances are discounted (*q* = 0 in the left panels of Fig. 4b), both lineage richness (based on the measure \( {}{}^0\overline{D}\;(T) \)) and total branch lengths (based on the measure ^{0}*PD*(*T*), i.e., Faith’s PD) exhibit the expected ordering: Decade I > Decade II > Decade III. When species/lineage abundances are counted (i.e. *q* = 1 and 2 in Fig. 4b), the profiles for Decades I and II cross because the assemblage of Decade II has more even abundant species than that of Decade I (see the first type of profiles for *T* = 0 and Fig. 3a, b). Note that if the time-depth is greater than 6 Myr (including the age of the root), then all the abundance-sensitive phylogenetic measures for the three assemblages are very close because most of the dominant species began to diverge around 6 Myr (Fig. 3b). This also explains the closeness of the three profiles in the first type of profile for *T* = 7.9 Myr (the *right panel* in Fig. 4a).

To illustrate the phylogenetic differentiation among assemblages, we focus on measuring the phylogenetic differentiation between any two decades for three pairs (i.e. Decades I vs. II, Decades I vs. III and Decades II vs. III). To see how the phylogenetic differentiation measures vary with the time perspective *q* and with the order *T*, we show two types of profiles for each of the two differentiation measures \( 1-{\overline{C}}_{qN}(T) \) and \( 1-{\overline{U}}_{qN}(T) \) in Figs. 5 and 6. In Fig. 5a, we present the first type of profile that plots the measure \( 1-{\overline{C}}_{qN}(T) \) as a function of *q* where *q* is in the range [0, 3] for two time perspectives: *T* = 0 (non-phylogenetic case) and *T* = 7.9 Myr (the age of the root node). In Fig. 5b, the same type of differentiation profile is shown for the other measure \( 1-{\overline{U}}_{qN}(T) \). Then in Fig. 6a, b, we present the second type of profile that shows the two measures as a function of temporal perspective *T*, 0 ≤ *T* ≤ 10, for *q* = 0, 1 and 2 separately.

Based on the two phylogenetic differentiation measures, all profiles in Figs. 5 and 6 show consistent patterns. When species/lineages abundances are discounted (*q* = 0), the differences among the differentiation measures of the three pairs of assemblages are not appreciable, as shown in the two left panels in Fig. 6 and in the initial point in each of profiles in Fig. 5. When species/lineages abundances are counted (*q* > 0), the compositional differentiation between Decades I vs. II is generally close to that between Decades I vs. III, and the differentiation between two recent decades (Decades II vs. III) is much lower than any of the other two pairs. This implies that the composition of species/lineage abundances has changed after 1990. Examining the relative abundances for those dominant species listed in Fig. 3b, we see that the most abundant species *S. paucispinis* (23 %) in Decade I became less abundant in both Decade II (9 %) and Decade III (11 %); the second most abundant species *S. mystinus* (11 %) in Decade I became quite rare in both Decade II (4 %) and Decade III (5 %). Also, the species *S. miniatus* in Decade I was rare, but it became the most dominant species in both Decade II (12 %) and Decade III (25 %). These compositional changes for dominant species help explain the above findings.

As the time perspective *T* becomes large, more dominant shared lineages are added to the two assemblages, implying the differentiation between any two assemblages should exhibit a non-increasing trend as *T* is increased. Our two differentiation measures for *q* > 0 in Fig. 6 show the expected decreasing trend, and the decline rates differ for *q* = 1 and *q* = 2. Based on Fig. 3b, we see that most of the dominant and isolated species began to diverge around 6 Myr ago. Thus, the two differentiation profiles for *q* = 1 and 2 start to decrease sharply around 6 Myr especially for order *q* = 2. Since the node abundances near roots (where the differentiation values are near zero) are relatively high and dominant in the whole tree, all values of the phylogenetic differentiation measures for *T* = 7.9 Myr (the first type of profile for *T* = 7.9 Myr in the right panel of Fig. 5) are substantially lower than their corresponding non-phylogenetic differentiation measure by comparing two figures (*T* = 0 and *T* = 7.9 Myr) in each row of Fig. 5. The two types of profiles (in Fig. 5a, b, and 6a, b) demonstrate that the two differentiation measures \( 1-{\overline{C}}_{qN}(T) \) and \( 1-{\overline{U}}_{qN}(T) \) can incorporate the differences in both tree structure and lineage abundances.

## Conclusion

- 1.
To quantify phylogenetic diversity of an assemblage, we suggest using two measures: (i) the phylogenetic Hill number\( {}{}^q\overline{D}\;(T) \) (Eqs. 4a and 4b) which measures the “the effective number of equally abundant and equally distinct lineages all with branch lengths

*T*”, and (ii) the phylogenetic or branch diversity^{q}*PD*(*T*) (Eqs. 5a and 5b) which measures the “effective total lineage-length”, i.e., the total evolutionary history on an assemblage since time*T*. These two measures depend explicitly on two parameters, the abundance sensitivity parameter*q*and the time perspective (or time-depth) parameter*T*. - 2.
Two types of diversity profiles are recommended for considering species/branch abundances and phylogenetic information: (i) The first type of diversity profile is obtained by plotting

^{q}*PD*(*T*) or \( {}{}^q\overline{D}\;(T) \) as a function of order*q*, for some selected values of temporal perspective*T*including*T*= 0 (i.e., the non-phylogenetic profile based on the ordinary Hill numbers), and*T*= the age of the most basal node. See the upper panels of Fig. 4 for an example. It would be also informative to include*T*= the age of the divergence between the group under study and the rest of the tree. (ii) The second type of diversity profile is obtained by plotting^{q}*PD*(*T*) and \( {}{}^q\overline{D}\;(T) \) as functions of*T*separately for*q*= 0, 1, and 2; see the middle and lower panels of Fig. 4 for an example. The second type of profile shows the effect of time-depth or evolution change on our diversity measures. - 3.
When there are multiple assemblages, the phylogenetic gamma Hill number is the effective number of equally abundant and equally distinct lineages in the pooled assemblage; the phylogenetic alpha Hill number is the effective number of equally abundant and equally distinct lineages per assemblage. Thus the phylogenetic beta Hill number, as the ratio of gamma and beta, is interpreted as “the number of phylogenetically completely distinct assemblages”. In this case, alpha and beta are unrelated (or independent). The difference of phylogenetic gamma and alpha Hill numbers is lineage excess, which is dependent on both alpha and gamma. The phylogenetic beta Hill number and lineage excess lead to the same classes of similarity and differentiation measures, listed in section “Normalized phylogenetic similarity measures”. See Table 1 for the two major classes of phylogenetic overlap measures, \( {\overline{C}}_{qN}(T) \) from a local perspective and

*U̅*_{qN}(*T*) from a regional perspective. - 4.
To assess the phylogenetic resemblance or differentiation among assemblages, two types of similarity or differentiation profiles as those in Point 2 are suggested for the two major classes of measures, \( {\overline{C}}_{qN}(T) \) and

*U̅*_{qN}(*T*) (Table 1); see Figs. 5 and 6 for examples.

## Notes

### Acknowledgements

The authors thank Roseli Pellens, Samuel Scheiner and an anonymous reviewer for helpful comments and suggestions. This work was supported by the Taiwan National Science Council under Contracts 100-2118-M007-006 and 103-2628-M007-007 for AC. CC is supported by a post-doctoral fellowship, National Tsing Hua University, Taiwan. LJ was supported by a donation from John V. Moore to the Population Biology Foundation. All numerical results presented in section “An example” of this chapter were obtained by expanding the R scripts provided in Pavoine et al. (2009, their Appendix S1) to include Hill numbers and our phylogenetic measures.

### References

- Allen B, Kon M, Bar-Yam Y (2009) A new phylogenetic diversity measure generalizing the Shannon index and its application to phyllostomid bats. Am Nat 174(2):236–243CrossRefPubMedGoogle Scholar
- Bryant JA, Lamanna C, Morlon H, Kerkhoff AJ, Enquist BJ, Green JL (2008) Microbes on mountainsides: contrasting elevational patterns of bacterial and plant diversity. Proc Natl Acad Sci U S A 105:11505–11511CrossRefPubMedPubMedCentralGoogle Scholar
- Cavender-Bares J, Ackerly DD, Kozak KH (2012) Integrating ecology and phylogenetics: the footprint of history in modern-day communities. Ecology 93(sp8):S1–S3CrossRefGoogle Scholar
- Cavender‐Bares J, Kozak KH, Fine PV, Kembel SW (2009) The merging of community ecology and phylogenetic biology. Ecol Lett 12(7):693–715CrossRefPubMedGoogle Scholar
- Chao A (2005) Species estimation and applications. In: Kotz S, Balakrishnan N, Read CB, Vidakovic B (eds) Encyclopedia of statistical sciences, vol 12, 2nd edn. Wiley, New York, pp 7907–7916Google Scholar
- Chao A, Jost L, Chiang SC, Jiang Y-H, Chazdon RL (2008) A two-stage probabilistic approach to multiple-community similarity indices. Biometrics 64(4):1178–1186CrossRefPubMedGoogle Scholar
- Chao A, Chiu C-H, Jost L (2010) Phylogenetic diversity measures based on Hill numbers. Phil Trans R Soc London B365:3599–3609CrossRefGoogle Scholar
- Chao A, Chiu C-H, Hsieh TC (2012) Proposing a resolution to debates on diversity partitioning. Ecology 93(9):2037–2051CrossRefPubMedGoogle Scholar
- Chao A, Chiu C-H, Hsieh TC, Davis T, Nipperess DA, Faith DP (2015) Rarefaction and extrapolation of phylogenetic diversity. Methods Ecol Evol 6:380–388Google Scholar
- Chiu C-H, Jost L, Chao A (2014) Phylogenetic beta diversity, similarity, and differentiation measures based on Hill numbers. Ecol Monog 84(1):21–44CrossRefGoogle Scholar
- Colwell RK, Coddington JA (1994) Estimating terrestrial biodiversity through extrapolation. Philos Trans R Soc Lond B345:101–118CrossRefGoogle Scholar
- Crozier R (1997) Preserving the information content of species: genetic diversity, phylogeny, and conservation worth. Annu Rev Ecol Syst 28:243–268CrossRefGoogle Scholar
- Daróczy Z (1970) Generalized information functions. Inf Control 16(1):36–51CrossRefGoogle Scholar
- de Bello F, Lavorel S, Díaz S, Harrington R, Cornelissen JH, Bardgett RD, Berg MP, Cipriotti P, Feld CK, Hering D (2010) Towards an assessment of multiple ecosystem processes and services via functional traits. Biol Cons 19(10):2873–2893CrossRefGoogle Scholar
- Ellison AM (2010) Partitioning diversity. Ecology 91:1962–1963CrossRefPubMedGoogle Scholar
- Faith DP (1992) Conservation evaluation and phylogenetic diversity. Biol Cons 61(1):1–10CrossRefGoogle Scholar
- Faith DP (2013) Biodiversity and evolutionary history: useful extensions of the PD phylogenetic diversity assessment framework. Ann NY Acad Sci 1289(1):69–89CrossRefPubMedGoogle Scholar
- Faith DP, Richards ZT (2012) Climate change impacts on the tree of life: changes in phylogenetic diversity illustrated for acropora corals. Biology 1(3):906–932CrossRefPubMedPubMedCentralGoogle Scholar
- Faith DP, Lozupone CA, Nipperess D, Knight R (2009) The cladistic basis for the phylogenetic diversity (PD) measure links evolutionary features to environmental gradients and supports broad applications of microbial ecology’s “phylogenetic beta diversity” framework. Int J Mol Sci 10:4723–4741CrossRefPubMedPubMedCentralGoogle Scholar
- Ferrier S, Manion G, Elith J, Richardson K (2007) Using generalized dissimilarity modelling to analyse and predict patterns of beta diversity in regional biodiversity assessment. Divers Distrib 13(3):252–264CrossRefGoogle Scholar
- Gotelli NJ, Colwell RK (2011) Estimating species richness. In: Magurran AE, McGill BJ (eds) Biological diversity: frontiers in measurement and assessment. Oxford University Press, Oxford, pp 39–54Google Scholar
- Hannah L, Kay JA (1977) Concentration in the modern industry: theory, measurement, and the U.K. experience. MacMillan, LondonCrossRefGoogle Scholar
- Havrda J, Charvát F (1967) Quantification method of classification processes: concept of structural α-entropy. Kybernetika 3(1):30–35Google Scholar
- Helmus MR, Bland TJ, Williams CK, Ives AR (2007) Phylogenetic measures of biodiversity. Am Nat 169(3):68–83CrossRefGoogle Scholar
- Hill M (1973) Diversity and evenness: a unifying notation and its consequences. Ecology 54(2):427–432CrossRefGoogle Scholar
- Horn HS (1966) Measurement of “overlap” in comparative ecological studies. Am Nat 100:419–424CrossRefGoogle Scholar
- Hubbell SP (2001) A unified theory of biodiversity and biogeography. Princeton University Press, PrincetonGoogle Scholar
- Hurlbert SH (1971) The nonconcept of species diversity: a critique and alternative parameters. Ecology 52(4):577–586CrossRefGoogle Scholar
- Hyde JR, Vetter RD (2007) The origin, evolution, and diversification of rockfishes of the genus Sebastes (Cuvier). Mol Phylogenet Evol 44:790–811CrossRefPubMedGoogle Scholar
- Ives AR, Helmus MR (2010) Phylogenetic Metrics of Community Similarity. Am Nat 176:128–142CrossRefGoogle Scholar
- Ives AR, Helmus MR (2011) Generalized linear mixed models for phylogenetic analyses of community structure. Ecol Monogr 81(3):511–525CrossRefGoogle Scholar
- Jost L (2006) Entropy and diversity. Oikos 113(2):363–375CrossRefGoogle Scholar
- Jost L (2007) Partitioning diversity into independent alpha and beta components. Ecology 88:2427–2439CrossRefPubMedGoogle Scholar
- Jost L (2009) Mismeasuring biological diversity: response to Hoffman and Hoffman (2008). Ecol Econ 68:925–927CrossRefGoogle Scholar
- Jost L (2010) Independence of alpha and beta diversities. Ecology 91:1969–1974CrossRefPubMedGoogle Scholar
- Jost L, DeVries P, Walla T, Greeney H, Chao A, Ricotta C (2010) Partitioning diversity for conservation analyses. Divers Distrib 16(1):65–76CrossRefGoogle Scholar
- Keylock C (2005) Simpson diversity and the Shannon-Wiener index as special cases of a generalized entropy. Oikos 109:203–207CrossRefGoogle Scholar
- Kimura M, Crow JF (1964) The number of alleles that can be maintained in a finite population. Genetics 49(4):725–738PubMedPubMedCentralGoogle Scholar
- Lozupone C, Knight R (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71(12):8228–8235CrossRefPubMedPubMedCentralGoogle Scholar
- MacArthur RH (1965) Patterns of species diversity. Biol Rev 40(4):510–533CrossRefGoogle Scholar
- MacArthur RH (1972) Geographical ecology. Harper & Row, New YorkGoogle Scholar
- MacArthur RH, Wilson EO (1967) The theory of island biogeography. Princeton University Press, PrincetonGoogle Scholar
- Magurran AE (2004) Measuring biological diversity. Blackwell, OxfordGoogle Scholar
- Magurran AE, McGill BJ (eds) (2011) Biological diversity: frontiers in measurement and assessment. Oxford University Press, OxfordGoogle Scholar
- McPeek MA, Miller TE (1996) Evolutionary biology and community ecology. Ecology 77(5):1319–1320CrossRefGoogle Scholar
- Morisita M (1959) Measuring of interspecific association and similarity between communities. Memoires of the Faculty of Science, Kyushu University. Ser E (Biol) 3:65–80Google Scholar
- Mouchet MA, Mouillot D (2011) Decomposing phylogenetic entropy into α, β and γ components. Biol Lett 7(2):205–209CrossRefPubMedGoogle Scholar
- Nipperess DA, Matsen FA IV (2013) The mean and variance of phylogenetic diversity under rarefaction. Methods Ecol Evol 4:566–572CrossRefPubMedPubMedCentralGoogle Scholar
- Patil GP, Taillie C (1979) A study on diversity profiles and orderings for a bird community in the vicinity of Colstrip, Montana. In: Patil GP, Rosenzweig M (eds) Contemporary quantitative ecology and related econometrics. International Co-operative Publishing House, Fairland, pp 23–48Google Scholar
- Patil GP, Taillie C (1982) Diversity as a concept and its measurement. J Am Stat Assoc 77:548–561CrossRefGoogle Scholar
- Pavoine S, Love MS, Bonsall MB (2009) Hierarchical partitioning of evolutionary and ecological patterns in the organization of phylogenetically structured species assemblages: application to rockfish (genus: Sebastes) in the Southern California Bight. Ecol Lett 12(9):898–908CrossRefPubMedGoogle Scholar
- Pavoine S, Baguette M, Bonsall MB (2010) Decomposition of trait diversity among the nodes of a phylogenetic tree. Ecol Monogr 80(3):485–507CrossRefGoogle Scholar
- Peet RK (1974) The measurement of species diversity. Annu Rev Ecol Syst 5:285–307CrossRefGoogle Scholar
- Petchey OL, Gaston KJ (2002) Functional diversity (FD), species richness and community composition. Ecol Lett 5(3):402–411CrossRefGoogle Scholar
- Pielou EC (1975) Ecological diversity. Wiley, New YorkGoogle Scholar
- Purvis A, Hector A (2000) Getting the measure of biodiversity. Nature 405(6783):212–219CrossRefPubMedGoogle Scholar
- Rao CR (1982) Diversity and dissimilarity coefficients: a unified approach. Theor Popul Biol 21(1):24–43CrossRefGoogle Scholar
- Rényi A (1961) On measures of entropy and information. In: Neyman J (ed) Proceedings of the fourth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, Berkeley, pp 547–561Google Scholar
- Ricotta C, Szeidl L (2009) Diversity partitioning of Rao’s quadratic entropy. Theor Popul Biol 76(4):299–302CrossRefPubMedGoogle Scholar
- Routledge R (1979) Diversity indices: which ones are admissible? J Theor Biol 76(4):503–515CrossRefPubMedGoogle Scholar
- Tilman D (2001) Functional diversity. In: Levin AS (ed) Encyclopedia of biodiversity, vol 3. Academic Press, San Diego, pp 109–120CrossRefGoogle Scholar
- Tsallis C (1988) Possible generalization of Boltzmann-Gibbs statistics. J Stat Phys 52(1):479–487CrossRefGoogle Scholar
- Vellend M, Cornwell WK, Magnuson-Ford K, Mooers A (2011) Measuring phylogenetic biodiversity. In: Magurran AE, McGill BJ (eds) Biological diversity: frontiers in measurement and assessment. Oxford University Press, Oxford, pp 194–207Google Scholar
- Warwick R, Clarke K (1995) New ‘biodiversity’ measures reveal a decrease in taxonomic distinctness with increasing stress. Mar Ecol Prog Ser 129(1):301–305CrossRefGoogle Scholar
- Webb CO (2000) Exploring the phylogenetic structure of ecological communities: an example for rain forest trees. Am Nat 156:145–155CrossRefPubMedGoogle Scholar
- Webb CO, Ackerly DD, McPeek MA, Donoghue MJ (2002) Phylogenies and community ecology. Annu Rev Ecol Syst 33:475–505CrossRefGoogle Scholar
- Webb CO, Losos JB, Agrawal AA (2006) Integrating phylogenies into community ecology. Ecology 87(sp7):S1–S2CrossRefGoogle Scholar
- Weiher E (2011) A primer of trait and functional diversity. In: Magurran AE, McGill BJ (eds) Biological diversity: frontiers in measurement and assessment. Oxford University Press, Oxford, pp 175–193Google Scholar
- Whittaker RH (1972) Evolution and measurement of species diversity. Taxon 12:213–251CrossRefGoogle Scholar

## Copyright information

**Open Access** This chapter is distributed under the terms of the Creative Commons Attribution-Noncommercial 2.5 License (http://creativecommons.org/licenses/by-nc/2.5/) which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

The images or other third party material in this chapter are included in the work’s Creative Commons license, unless indicated otherwise in the credit line; if such material is not included in the work’s Creative Commons license and the respective action is not permitted by statutory regulation, users will need to obtain permission from the license holder to duplicate, adapt or reproduce the material.