Phylogenetic Diversity (PD ) is a simple, intuitive and effective measure of biodiversity . The PD of a set of taxa, represented as the tips of a phylogenetic tree, is the sum of the branch length s connecting those taxa (Faith 1992). PD is a particularly flexible measure because it can be applied to any set of relationships among entities that can be reasonably portrayed as a tree. Thus, the tips do not, by necessity, need to represent species but could be higher taxa , Operational Taxonomic Units, Evolutionarily Significant Units, individual organisms or unique haplotypes. Further, the tree itself might not portray evolutionary relationships but instead be, for example, a cluster dendrogram portraying functional relationships among taxa (Petchey and Gaston 2002).

Since the original formulation by Faith (1992), PD has come to be not just a single measure equating to a phylogenetically weighted form of richness , but rather a general class of measures dealing with various aspects of alpha and beta-diversity (Faith 2013). The common feature of this class of measures is the summation of branch length s rather than the counting of tips. By substituting branch segments (intervals between nodes on a phylogenetic tree) for species, and including a weighting for the length of that segment, it is possible to modify many of the classic measures of Species Diversity (SD) to a PD equivalent (Faith 2013). By this means, phylogenetically weighted measures of endemism (Faith et al. 2004; Rosauer et al. 2009), ecological resemblance (Ferrier et al. 2007; Nipperess et al. 2010), and entropy (Chao et al. 2010, and chapter “Phylogenetic Diversity Measures and Their Decomposition: A Framework Based on Hill Numbers”) have been developed, for example.

In its classic form, PD , like species richness , has the property of concavity (Lande 1996). That is, the addition of individuals or sets of individuals to a community can increase PD but never decrease it. Thus, just like species richness, PD increases monotonically with increasing sampling effort , creating a classic sampling curve that reaches an asymptote when all species (and branch segments) are represented (Fig. 1). Gotelli and Colwell (2001) recognise two general types of sampling curve, individuals-based and sample-based, that are distinguished by the units on the x-axis, representing either individual organisms or samples, respectively. Samples, in this context, are collections of individuals bounded in space and time, corresponding to the common ecological usage of the term. For PD, we can recognise a third type of sampling curve where the units on the x-axis are species or their equivalent (Fig. 1). Species, like samples, are also collections of individuals bounded, in this case, by some minimum degree of relatedness. Obviously, species-based sampling curves are meaningless when plotting species richness but have real value when plotting PD. For the purposes of generalisation, it is useful to be able to refer to these units (individuals, samples, species) with a single term. Chiarucci et al. (2008) used “accumulation units” to refer to individuals and samples. I extend this term to also include species as an additional unit of sampling effort in sampling curves. While these different units (individuals, samples, species) all measure sampling effort in some sense, they are not equivalent and sampling curves derived from them must be interpreted differently in each case.

Fig. 1
figure 1figure 1

Sampling curve showing the relationship between Phylogenetic Diversity (PD ) and sampling depth. The level of sampling is measured in accumulation units of individuals, samples (collections of individuals) or species as required. PD N is the Phylogenetic Diversity of the full set of N accumulation units. Rarefaction is the process (indicated by unidirectional arrow) of randomly subsampling (rarefying) the pool of N accumulation units to a subset of size m and calculating the expected PD of that subset (PD m ). ∆PD is the expected gain in PD between the first and second accumulation unit, and can be used as a measure of phylogenetic evenness , beta-diversity or dispersion , depending on the nature of the unit of accumulation

Beside the units by which sampling effort is measured, Gotelli and Colwell (2001) distinguished between “accumulation curve s ” and “rarefaction curve s ”, based on the process by which the sampling curve is calculated. An accumulation curve plots a single ordering of individuals or samples (or species) against a cumulatively calculated concave diversity measure. The jagged shape of the resulting curve is highly dependent on the, often arbitrary, order of the accumulation units. To resolve this problem, rarefaction curves instead plot the expected value of the diversity measure against the corresponding number of accumulation units. Rarefaction can be achieved using an algorithmic procedure of repeated random sub-sampling of the full set of accumulation units and calculating the mean diversity (Gotelli and Colwell 2001). However, Hurlbert (1971) and Simberloff (1972) showed that expected diversity can be calculated using an exact analytical solution, obviating the need for computer-intensive repeated sub-sampling. Initially, this solution was for individuals-based rarefaction curves, but it has since been shown that the same solution applies to sample-based rarefaction (Kobayashi 1974; Ugland et al. 2003; Mao et al. 2005; Chiarucci et al. 2008).

The original purpose of rarefaction was to allow the comparison of datasets with differing amounts of sampling effort (Sanders 1968). Assemblages can be compared “fairly” when rarefied to the same number of accumulation units (Gotelli and Colwell 2001). However, rarefaction has broader application than this single purpose. Depending of the unit of accumulation, the shape of the rarefaction curve provides information on ecological evenness (Olszewski 2004) and beta-diversity (Crist and Veech 2006). Rarefaction of species richness also forms the basis of estimators of species richness, including unseen species (Colwell and Coddington 1994). In the case of PD , species-based rarefaction curves also allow for a measure of phylogenetic dispersion (Webb et al. 2002), effectively the expected PD for some given number of species (Nipperess and Matsen 2013). A solution for the rarefaction of PD is therefore desirable as it will allow for these applications to be realised for phylogenetically explicit datasets.

Rarefaction of Phylogenetic Diversity , using an algorithmic solution of repeated sub-sampling, has now been done several times (see for example Lozupone and Knight 2008; Turnbaugh et al. 2009; Yu et al. 2012). However, an analytical solution for PD rarefaction, similar to that determined by Hurlbert (1971) for species richness , is preferable both because its results are exact (not dependent on the number of repeated subsamples) and substantially more computationally efficient. Nipperess and Matsen (2013) recently published just such a solution for both the mean and variance of PD under rarefaction. This solution is quite general, being applicable to rooted and unrooted trees, and even allowing partition of the tree into smaller components than the individual branch segments. As a result, the solution is given in a very generalised form and its relationship with classic rarefaction formula for species richness is not immediately clear.

In this chapter, I provide a detailed formulation for the exact analytical solution for expected (mean) Phylogenetic Diversity for a given amount of sampling effort . This formulation is for the specific but common case of a rooted phylogenetic tree where whole branch segments are selected under rarefaction. I use the same form of expression as used by Hurlbert (1971) to demonstrate the direct relationship between rarefaction of PD and rarefaction of species richness . I do not include a solution for variance of PD under rarefaction due to its complexity when given in this form and instead refer the reader to Nipperess and Matsen (2013). I extend this framework to show how the initial slope of the rarefaction curve (∆PD) can be used as a flexible measure of phylogenetic evenness , phylogenetic beta-diversity or phylogenetic dispersion , depending on the unit of accumulation. I apply PD rarefaction and the derived ∆PD measure to real ecological datasets to demonstrate its usefulness in addressing ecological questions. Finally, I discuss some future directions for the extension and application of PD rarefaction.


To begin, the classic rarefaction formula for species richness will be reviewed in order to demonstrate how it can be extended to the case of Phylogenetic Diversity . The expected species richness (S) for a given amount of sampling is simply the sum of probabilities (p) of each species occurring in a subset of m accumulation units (Eq. 1).

$$ E{\left[S\right]}_m={\displaystyle \sum}_i^S{}_m{}p_i $$

To solve Eq. 1, we need to determine the probability (p) of each species being selected by a random draw of m accumulation units from the total set of N units. Regardless of whether the accumulation unit is an individual or a sample, this probability is a function of the frequency (n) with which species i occurs across the set of N accumulation units (Chiarucci et al. 2008). Since N is a set of finite size, random draws from that set should be without replacement and thus p is defined by the hypergeometric distribution (Hurlbert 1971). Substituting into Eq. 1, the expected species richness is as follows (Eq. 2).

$$ E\left[S\right]0.1em {}_m{}={\displaystyle \sum}_i^S\left[1-\frac{\left(\begin{array}{c}N-{n}_i\\ {}m\end{array}\right)}{\left(\begin{array}{c}N\\ {}m\end{array}\right)}\right] $$

The quantity within the square brackets in Eq. 2 corresponds to p in Eq. 1. Note that the expressions in curved brackets are binomial coefficients and not simple fractions, while the quantity subtracted from one within the square brackets is a fraction. The denominator in this fraction gives the number of distinct subsets of size m that can be drawn from the total set of N units. The numerator gives the number of distinct subsets of size m that do not contain species i. Equation 2 is the same as that originally proposed by Hurlbert (1971).

Phylogenetic Diversity is simply the sum of a set of branch length s spanning a set of species (or, more generally, tips). So, for a set of S species, there is a corresponding set of T branch segments. Each branch segment (j) has a length (L) measured as sequence substitutions, millions of years, or some other biologically meaningful estimate of difference. Considering only rooted phylogenetic trees, PD is calculated as follows (Eq. 3).

$$ PD={\displaystyle \sum}_j^T{L}_j $$

In the original definition intended by Faith (1992), the PD of a subset of species is calculated by summing the branch length s connecting that set of species to the root of the tree, even when the common ancestor of that subset is not the same as the root. In this definition, a subset containing a single species (or even a single individual) has a non-zero PD value, which in this case, would be the total path length from the tip to the root. This corresponds to the rooted PD value of Pardi and Goldman (2007). The alternative, called unrooted PD by Pardi and Goldman (2007), includes only the branch segments connecting a subset of species to their common ancestor, and thus a subset containing only a single species would have zero PD. The former definition, rooted PD, is adopted here because it allows for the straight-forward formulation of a whole class of derived PD measures (Faith 2013), and because it is concordant with the original idea of PD acting as a surrogate for the feature diversity of a set (Faith 1992; Faith et al. 2009). Obviously, rooted PD requires a rooted phylogenetic tree, even if the choice of root is arbitrary (Nipperess and Matsen 2013).

Given this definition, the rarefaction of PD involves finding the expected (average) sum of branch length s (including the path to the root) for all possible distinct subsets of m accumulation units (Fig. 2). This is achieved by extending the classic rarefaction formula through a substitution of species for branch segments in a phylogenetic tree. Since PD is simply the sum of branch lengths, then the expected PD must also be the sum of branch lengths, each weighted by the probability (q) of its occurrence in a subset of size m (O’Dwyer et al. 2012). So, for a rooted phylogenetic tree represented as a set of T branch segments, the expected PD is given as follows (Eq. 4).

Fig. 2
figure 2figure 2

An illustration of the process of rarefying Phylogenetic Diversity (PD ) by units of individuals. An initial sample of ten individuals (m = 10) distributed among four tips (species) is rarefied to a subset of five individuals (m = 5) by a process of random sampling without replacement. For the rarefied samples, 2 of the 252 possible subsets are shown. The expected PD under rarefaction is the average sum of branch length s represented by each of these distinct subsets. The branch lengths summed to calculate PD are black while those not represented (and thus not summed) are grey. Note that the rooted definition of PD is used where the path length to the root is always included, even in the case where only a single tip is represented

$$ E{\left[PD\right]}_{0.1em m}={\displaystyle \sum}_j^T{L}_j\times {}_m{}q_j $$

The probability of each branch segment occurring in a subset is again a function of the frequency with which it occurs among accumulation units. The frequency of occurrence of a particular branch segment (o) depends on the frequency of occurrence of species that are descendent from that branch segment. Let x be a binary value indicating whether species i is (1) or is not (0) a descendant of branch segment j. Multiplying x by n and summing across all species will give the total number of occurrences of branch segment j among N accumulation units (Eq. 5).

$$ {o}_j={\displaystyle \sum}_i^S\left({n}_i\times {x}_{ij}\right) $$

Thus, by summing across branches instead of species, substituting branch occurrence for species occurrence, and including a branch length weighting, we are able to adapt the classic rarefaction formula for species richness for the purposes of calculating expected Phylogenetic Diversity (Eq. 6). Note this solution is equivalent to that of Nipperess and Matsen (2013) but is expressed in an expanded form for the specific case of calculating rooted PD . Equation 6 is very similar to the solution for expected PD of Faith (2013) but differs in that random draws are without replacement following the hypergeometric distribution.

$$ E{\left[PD\right]}_m={\displaystyle \sum}_j^T\left[{L}_j\times \left(1-\frac{\left(\begin{array}{c}N-{o}_j\\ {}m\end{array}\right)}{\left(\begin{array}{c}N\\ {}m\end{array}\right)}\right)\right] $$

Finally, it is now possible to calculate the expected PD for a given number of species. A species, in this context, is simply a collection of individuals in much the same way as a sample is a collection of individuals, and the same equations apply. Under these circumstances, o j is equal to the sum of x ij (over all species) as n i will always equal 1, and N is equal to S. Substituting into Eq. 6 gives the following formula for rarefaction by species (Eq. 7).

$$ E{\left[PD\right]}_m={\displaystyle \sum}_j^T\left[{L}_j\times \left(1-\frac{\left(\begin{array}{c}S-{\displaystyle \sum }{x}_{ij}\\ {}m\end{array}\right)}{\left(\begin{array}{c}S\\ {}m\end{array}\right)}\right)\right] $$


It has previously been recognised (Lande 1996; Olszewski 2004) that there is a relationship between individuals-based rarefaction curve s and measures of evenness . Specifically, the initial slope of the individuals-based curve for species richness is equal to the PIE (Probability of Interspecific Encounter ) index of Hurlbert (1971). The initial slope of the rarefaction curve is the difference between the expected species richness for two individuals (m = 2) and the expected species richness for one individual (m = 1), and is the probability that the second individual will be a different species from the first (Olszewski 2004). The PIE index is directly related to the Gini-Simpson index – the probability that two individuals selected at random will be different species. The difference between these two indices is in the form of random sampling – Gini-Simpson samples with replacement (thus assuming infinite population size) while PIE, just like rarefaction, samples without replacement. Following Olszewski (2004), PIE can be expressed as the following (Eq. 8) where E[S 1 ] and E[S 2 ] refer to the expected species richness of one and two randomly drawn individuals respectively. Note that E[S 1 ] always equals one in this case.

$$ PIE=E\left[{S}_2\right]-E\left[{S}_1\right] $$

When considering a sample-based curve, it is clear that the initial slope is related to the beta-diversity of the set of samples from which the curve is calculated. In this case, the difference between E[S 1 ] and E[S 2 ] is the expected number of species in the second sample that are not found in the first. Thus, the PIE index can be used to measure beta-diversity if applied to sample-based rarefaction. This interpretation is directly related to the additive partitioning of species diversity into alpha and beta components where alpha-diversity is the mean (expected) richness of a single sample and beta-diversity is the gain in species richness from a single sample to a larger set of samples and can be read directly from a rarefaction curve (Crist and Veech 2006).

It follows that we can also define measures of phylogenetic evenness and phylogenetic beta-diversity using the initial slope of the PD rarefaction curve , where the units of accumulation are either individuals or samples respectively (Fig. 1). In either case, the initial slope is the expected gain in PD (∆PD) when adding a second accumulation unit to the first. Further, because PD rarefaction curves can also meaningfully use species as accumulation units, we can extend this idea to include a measure of phylogenetic dispersion where the gain in PD is the expected branch length in the lineage (path from tip to root) of a second randomly selected species that is not shared with the first. Thus, we can define a general measure (∆PD) for phylogenetic evenness, phylogenetic beta-diversity or phylogenetic dispersion, depending on the accumulation units chosen (Eq. 9, see also Fig. 1). ∆PD is very similar to the ∆PDq measure of Faith (2013) although in that case, probabilities are not derived from the hypergeometric distribution. Further, ∆PDq is specifically applied to the problem of estimating loss of PD from extinction – a problem that is mathematically similar to rarefaction.

$$ \varDelta PD=E\left[P{D}_2\right]-E\left[P{D}_1\right] $$

If branch length s are measured as millions of years between branching events, then ∆PD is measured in units that make intuitive sense and allows for direct comparison across trees and systems. Alternatively, one could standardise the measure by dividing by its theoretical maximum. ∆PD will be maximum when all individuals, species or samples represent wholly distinct lineages with no shared branch lengths. For an ultrametric tree, the lineage length (path from tip to root) is invariant across species and is equal to the depth of the tree. When rarefaction is by units of individuals or species, E[PD 1 ] is the lineage length. When rarefaction is by units of samples, E[PD 1 ] will equal the average PD of a sample and will be equal to ∆PD in the extreme case where each sample shares no branch length with any other sample. Thus, whether referring to units of individuals, species or samples, E[PD 1 ] represents the theoretical maximum of ∆PD and can be used to standardise the measure as follows.

$$ \varDelta P{D}_{standard}=\frac{\varDelta PD}{\varDelta P{D}_{\max }}=\frac{E\left[P{D}_2\right]-E\left[P{D}_1\right]}{E\left[P{D}_1\right]} $$


The following is a demonstration of the application of PD rarefaction, and the derived ∆PD statistics, to real ecological datasets. These applications are not intended to provide definitive answers to ecologically important questions but are, rather, simple demonstrations of how PD rarefaction can allow new analyses to be undertaken and, hopefully, new insights gained.

In all these applications, I have used published data on mammals. This is principally for convenience as mammals (Bininda-Emonds et al. 2007) and birds (Jetz et al. 2012) are the only major taxonomic groups for which comprehensive species-level supertrees are available. I have used an updated version of the mammal supertree of Bininda-Emonds et al. (2007) published as supplementary material by Fritz et al. (2009). In this supertree, all branch length s are measured in units of time (millions of years between branching events), allowing for a straight-forward interpretation of PD as cumulative evolutionary history (Proches et al. 2006).

All analyses were conducted using the statistical software, R version 2.15.2 (R Core Team 2012). Phylogenetic information was processed using the ape package in R (Paradis et al. 2004). PD rarefaction analyses used the phylodiv, phylocurve and phylorare functions, written by the author and available from:

Standardisation of Sampling

The most commonly used application for rarefaction is standardisation to allow comparisons to be made between datasets with differing amounts of sampling effort . Standardisation can be achieved by rarefying all datasets back to a common (typically the minimum) number of accumulation units (Sanders 1968; Gotelli and Colwell 2001).

Law et al. (1998) surveyed bats in ten State Forests of the south-west slopes region of New South Wales, Australia. Survey methods were a combination of ultra-sonic detectors, harp-traps, mist-nets and trip-lines. For the purposes of this demonstration, only data from the harp-traps will be used. A harp-trap is a rectangular frame, stringed vertically with nylon line, placed so as to intercept the flight path of low-flying bats (Tidemann and Woodside 1978). A bat striking the nylon lines of the trap will tumble down into a collecting bag at the bottom.

Sampling effort among State Forests was variable with between 8 and 30 trap-nights. Comparison of bat diversity between State Forests is therefore confounded by variation in sampling effort , as can be seen when plotting separate PD rarefaction curve s for each State Forest (Fig. 3). To correct for variation in trapping effort, expected PD for each State Forest was calculated for the common value of 15 individuals, which was the minimum number recovered from a State Forest (Fig. 3). While rarefying to eight trap-nights (samples) would also be an appropriate method of standardisation, data on the bat species caught per trap-night were not available in Law et al. (1998). Standardising for sample effort changed the rank order of the sites for Phylogenetic Diversity (Table 1). A test of the rank correlation between the standardised and non-standardised PD values was relatively high but non-significant (Spearman’s correlation coefficient, rho = 0.57, p = 0.084). Therefore, what one concludes about the relative bat diversity (and perhaps conservation importance) among these sites is dependent upon whether or not sampling effort is taken into account.

Fig. 3
figure 3figure 3

An example of standardisation of Phylogenetic Diversity (PD ) by rarefaction. Data are abundances of bats caught in harp-traps in State Forests of the south-west slopes region of New South Wales, Australia. See Law et al. (1998) for a description of the data. Plotting separate individuals-based curves (grey lines) for each site shows considerable variation in sampling effort , with the raw value of PD being dependent on the number of trapped individuals. To allow for comparison between sites, PD is rarefied to an expected value for 15 individuals for all sites (indicated by black vertical line)

Table 1 Comparison of diversity measures for bat assemblages for ten state forests of the south-west slopes region of New South Wales, Australia

Phylogenetic Evenness

The extension of PD rarefaction to ∆PD allows for the measurement of phylogenetic evenness , which is essentially a measure of the distribution of individuals among branches in a phylogenetic tree (Webb and Pitman 2002). A phylogenetically even community is one where the most evolutionarily distinct species are also the most abundant. Because ∆PD will increase with both increasing phylogenetic evenness and phylogenetic diversity , it is more correctly a measure of entropy (Jost 2006), directly comparable to the PIE and Gini-Simpson indices. It has a particularly close relationship with the quadratic entropy measure of Rao (1982). Rao’s quadratic entropy measures the average distance between individuals in an assemblage. When that distance is measured as patristic distance (path length on a phylogenetic tree), ∆PD will be approximately half of Rao’s quadratic entropy. ∆PD is also similar in intent, but not in form, to the phylogenetic entropy index of Allen et al. (2009).

Low ecological evenness may be an indicator of disturbance where a small number of species are favoured. If those favoured species are also closely related, due to sharing a trait that allows exploitation of disturbance events, we can expect a reduction in phylogenetic evenness (Helmus et al. 2010). Medellin et al. (2000) surveyed the bat assemblages along a disturbance gradient in the Selva Lacandona, Chiapas, Mexico. The disturbance gradient consisted of four habitats, which, in order of disturbance, were cornfield, oldfield, cacao plantation and forest. Bats were sampled using mist nets and each habitat in the disturbance gradient was sampled using the same effort, thus making possible the comparison of habitats without the need for rarefaction. Medellin et al. (2000) found a trend of decreasing species richness and species evenness with increasing disturbance, and this trend is also reflected in the phylogenetic diversity and evenness of the assemblages (Table 2, Fig. 4).

Table 2 Comparison of diversity measures for bat assemblages from four habitats along a disturbance gradient in the Selva Lacandona, Chiapas, Mexico
Fig. 4
figure 4figure 4

Individuals-based PD rarefaction curve s for bat assemblages from four habitats along a disturbance gradient in the Selva Lacandona, Chiapas, Mexico.(See Medellin et al. (2000) for a description of the data. Phylogenetic evenness (∆PD) values are highest in the least disturbed habitat (Forest) and lowest in the most disturbed habitat (Cornfield)

The trend in phylogenetic evenness may simply be reflecting the abundance distribution among species. To determine the phylogenetic contribution to phylogenetic evenness, ∆PD was divided by the PIE index (Table 2). Since PIE is the probability that the second randomly selected individual is a different species to the first, we can divide ∆PD by PIE to get the expected branch length of that species (conditional on the second individual being a different species). This value is related to phylogenetic dispersion (∆PD from a species-based rarefaction curve ) but differs due to the conditional probability structure, and effectively measures the pure phylogenetic contribution to ∆PD independent of the abundance distributions among species. We see, in this case, that the phylogenetic component generally decreases with increasing disturbance (Cacao being the exception), supporting the notion that disturbance favours more closely related species.

Phylogenetic Beta-Diversity

Phylogenetic beta-diversity is effectively the turnover of branch length s between samples in space and/or time. Like its species-level equivalent, phylogenetic beta-diversity can be measured on a pair-wise basis (Lozupone and Knight 2005; Bryant et al. 2008; Nipperess et al. 2010) or as a single value for a set of samples (Anderson et al. 2010). Rarefaction of PD provides a means for deriving a single value of beta-diversity for a set of samples of any size via the ∆PD measure, which is a phylogenetic analogue of the additive partitioning approach of Crist and Veech (2006).

Morton et al. (1994) compiled data on small mammal assemblages for 245 sites in arid Australia. I calculated beta-diversity for two regions from this dataset – Tanami desert and Uluru-Kata Tjuta National Park, Northern Territory. These regions had a similar number of sites (Table 3) covering a roughly similarly sized area but differed in the number of vegetation types. The Tanami sites were all spinifex grassland while the Uluru sites comprised a mix of spinifex grassland, acacia shrubland and woodland (Morton et al. 1994). It might be expected therefore that the Uluru sites will show higher beta-diversity due to the diversity of habitats represented. In addition to ∆PD , I used the additive partitioning method to calculate species-level beta-diversity as the difference between total species richness of all sites in a region and the mean species richness of a single site (Lande 1996; Crist and Veech 2006).

Table 3 Comparison of diversity measures for small mammal assemblages of sites in the Tanami Desert and Uluru-Kata Tjuta National Park, Northern Territory, Australia

Contrary to expectations, the Tanami desert sites showed greater species beta-diversity and phylogenetic beta-diversity despite the lack of variation in vegetation type (Table 3). This pattern is driven by the much higher site-level (alpha) species richness in Uluru-Kata Tjuta National Park (Table 3, Fig. 5) without a concomitant increase in overall (gamma) species richness, resulting in a high degree of species overlap. Given the overlap in species among Uluru sites, it appears that most small mammals are not specialised for particular vegetation types.

Fig. 5
figure 5figure 5

Sample-based rarefaction curve s for small mammal assemblages of sites in the Tanami Desert and Uluru-Kata Tjuta National Park, Northern Territory, Australia. See Morton et al. (1994) for a description of the data. Phylogenetic beta diversity (∆PD ) is higher among the Tanami sites than the Uluru sites

Phylogenetic Dispersion

Phylogenetic dispersion is a measure of the average phylogenetic distance among species (or tips) (Webb et al. 2002) and is in effect a measure of tree shape (Davies and Buckley 2012). ∆PD provides a simple, intuitive measure of dispersion as the expected gain in PD of adding a second randomly selected species to the first. It can also be seen as a means of correcting for variation in species richness among samples, as it is well known that PD increases with species richness (Rodrigues and Gaston 2002).

I generated PD rarefaction curve s and ∆PD values for the mammal faunas of 71 of the 79 terrestrial ecoregions recognised by Olson et al. (2001) as constituting the Australasian biogeographic realm. Data were sourced from the wildfinder database ( of the World Wildlife Fund. Eight ecoregions were excluded from the analysis because they had less than two species and thus a ∆PD value could not be calculated.

The ecoregions show huge variation in species richness and, as expected, Phylogenetic Diversity is highly dependent on species richness (Fig. 6). Tropical ecoregions (such as the central range Montane rainforests, New Guinea) have high species richness and high Phylogenetic Diversity (Fig. 6, Table 4). When considering phylogenetic dispersion , however, other ecoregions show unusually high or low values given their species richness (Table 4). The ecoregion with the lowest ∆PD is the New Caledonia dry forests. Because of its isolation, this fauna consists exclusively of bats and thus all the species are relatively closely related. The ecoregion with the highest ∆PD was the Mount Lofty woodlands of South Australia, reflecting relatively high numbers of marsupial species compared to the more tropically distributed bats and rodents.

Fig. 6
figure 6figure 6

Species-based rarefaction curve s for mammal assemblages of terrestrial ecoregions of the Australasian biogeographic realm Ecoregions are as defined by Olson et al. (2001). Data are sourced from the wildfinder database ( Three ecoregions are highlighted, as having minimum (New Caledonia dry forests), maximum (Mount Lofty woodlands) or median (Central Range montane rainforests) values of phylogenetic dispersion (∆PD )

Table 4 Comparison of diversity measures for mammal assemblages of selected ecoregions of the Australasian biogeographic realm

Future Directions

As demonstrated here, rarefaction of PD has a straightforward application in standardising PD across samples so that they can be compared directly. Further, depending on the accumulation unit, the rarefaction formula can be extended to the calculation of metrics of phylogenetic evenness , phylogenetic beta-diversity and phylogenetic dispersion . However, the application of the PD rarefaction formula and its extension to other metrics is still very much in its infancy. Here I will outline some future directions for PD rarefaction.

Rarefaction by units of species allows for the comparison of locations while controlling for variation in species richness . This can easily be done by either rarefying all locations to a given number of species (Nipperess and Matsen 2013) or via ∆PD as demonstrated here. This kind of correction has previously been done by including species richness as an explanatory variable in a statistical model and taking the residuals (Davies et al. 2008) or by comparison to a null model derived by repeated subsampling (Davies et al. 2007). The latter method is often used as a statistical test of phylogenetic dispersion (also known as phylogenetic structure) where random draws are taken from a species pool, representing a null community assembly process (Webb 2000). Such methods are no longer necessary as the exact relationship between species richness and PD is described by the rarefaction curve (Nipperess and Matsen 2013). Further, the exact analytical solution is computationally efficient, allowing for practical application to very large datasets.

By removing the effect of species richness , we can identify “evolutionary hotspots” with higher than expected phylogenetic diversity (Davies et al. 2008; Nipperess and Matsen 2013) on a regional or global scale . We can then use the standardised PD values (called relative PD by Davies et al. 2007) to explore the environmental, ecological and historical processes that lead to the observed patterns of high or low phylogenetic dispersion (Kooyman et al. 2013). Ultimately, we may be able to develop the theory to predict these patterns (Davies et al. 2007), in a similar vein to what has been done for species richness (Arrhenius 1921; MacArthur and Wilson 1963; Rosindell et al. 2011). For example, the relationship of species richness with area is well known but the phylogeny-area relationship has only recently begun to be explored (Morlon et al. 2011). Rarefaction curves have an obvious connection to species-area curves (Olszewski 2004) and thus the development of PD rarefaction may well improve understanding of the phylogeny-area relationship. In particular, species-based rarefaction of PD allows for the separation of species diversity effects from those purely explained by phylogeny.

It is possible to predict how much Phylogenetic Diversity is yet to be sampled from the observed rarefaction curve . Rarefaction is the basis of several species diversity estimators, which attempt to calculate total diversity (including unseen species) for a set of individuals or samples by effectively extending the curve beyond the observed sampling depth (Colwell and Coddington 1994). It follows that a useful extension of PD rarefaction would be a PD estimator that predicts unseen branch length , given the observed rate of accumulation of PD. It is important to note that PD rarefaction calculates the expected branch length gained by adding additional accumulation units but does not predict where on the tree these branches will come from. Similarly, a biodiversity estimator based on PD rarefaction may be able to predict the amount of PD not yet sampled but would not be able to predict where these unseen branches would be added to an existing tree. This would be, nevertheless, an exciting development.

It has recently been proposed that the standardisation of samples for species diversity should not be done by rarefaction to the same size (i.e. no. of individuals), but rather by sample completeness (Alroy 2010; Jost 2010; Chao and Jost 2012). Completeness, when measured by a statistic known as coverage (Good 1953), is the proportion of individuals in a community that are represented by species in a sample from that community (Chao and Jost 2012). When samples differ in their coverage, they should be standardised to equal coverage before a “fair” comparison can be made. Much like expected species richness , the coverage of a sample can be estimated from the sample size and the distribution of individuals among the species in the sample (Chao and Jost 2012). Given that standardisation by sample completeness has been shown to yield a less biased comparison of species richness between communities (Chao and Jost 2012), it would be desirable to have a similar method of standardisation for PD . Since rarefaction of coverage is mathematically related to rarefaction of sample size, the recent work on estimating PD from sample size will no doubt form the basis from which estimated PD for sample coverage will be developed.

Finally, a general issue when considering any PD measure is uncertainty regarding the length of branches and the topology (branching pattern ) of the tree. All PD measures (including those presented here) assume that the branch length s and their arrangement in the tree are perfectly known. This is obviously an abstraction, although PD can be surprisingly robust to this source of variation (Swenson 2009). One solution to this dilemma is to calculate PD, including rarefied PD, for a large number of possible trees and report the mean and confidence limits. The output from a Bayesian phylogenetic analysis is a large number of trees, each with their own topology and corresponding branch lengths (see for example Jetz et al. 2012) and so lends itself well to this approach. However, when the possible trees number in the thousands and tens of thousands, this is obviously computationally intensive. An analytical solution, directly incorporating uncertainty into the calculation, would therefore be desirable. This is not an easy extension of the PD rarefaction solution because both variation in branch length and topology (affecting the probability of encountering internal branches) would need to be taken into account. It is worth remembering that phylogenetic relationships are not the only source of uncertainty when investigating real ecological communities – neither the abundance, nor even the presence (occupancy), of species are necessarily known with precision.


The formulation for the rarefaction of Phylogenetic Diversity (PD ) is given in expanded form to show its simplicity and its connection to the classic formula for the rarefaction of species richness (Hurlbert 1971; Simberloff 1972). The method is exact and efficient and should be preferred over the algorithmic (Monte Carlo) solution involving repeated random sub-sampling. Further, the extension to the calculation of ∆PD provides a flexible and general framework for the measurement of biodiversity as phylogenetic evenness , phylogenetic beta-diversity or phylogenetic dispersion . The applications of PD rarefaction and ∆PD presented here are hopefully useful in improving understanding of the importance of rarefaction in ecology and in guiding future applications of the method. There are, I believe, exciting prospects for PD rarefaction in the future, including as a general method for standardising PD by removing variation with species richness, and for predicting unseen (i.e. un-sampled) PD. The recent availability of comprehensive phylogenies (Bininda-Emonds et al. 2007; Jetz et al. 2012) and rich data on species occurrences (Flemons et al. 2007), coupled with analytical advances such as PD rarefaction, allows us to better understand the distribution of Phylogenetic Diversity on the surface of the Earth and the processes giving rise to that distribution. This is valuable for its own sake but will also inform efforts to conserve as much of the Tree of Life as possible in the face of future extinctions (Rosauer and Mooers 2013).