1 Introduction

Phylogenetic trees provide a way to quantify biodiversity and the extent to which it might be lost in the face of the current mass extinction event. One such biodiversity measure is phylogenetic diversity (PD), which associates to each subset S of extant species, the sum of the branch lengths of the underlying evolutionary tree that connects (just) these species to the root of the tree. This measure, pioneered by Faith (1992), provides a more complete measure of biodiversity than merely counting the number of species in S (see e.g. Miller et al. 2018). Moreover, if new features evolve along the branches of a tree and are never lost, then the resulting features present amongst the species in the subset S (the feature diversity (FD) of S) are directly correlated with the phylogenetic diversity of S (Wicke et al. 2021). However, when features are lost, it has recently been shown mathematically that under simple (deterministic or stochastic) models of feature gain and loss, FD necessarily deviates from PD except for very trivial types of phylogenetic trees (Theorems 2 and 3 of Rosindell et al. (2022)). More generally, the question of the extent to which PD captures feature or functional diversity has been the subject of considerable debate in the biological literature (see Devictor et al. 2010; Mazel et al. 2017, 2018, 2019; Owen et al. 2019; Tucker et al. 2018, 2019).

In this paper, we investigate a related question: under a standard phylogenetic diversification model and a simple stochastic process of feature gain and loss, what proportion of feature diversity is expected to be lost in a mass extinction event at the present? And how does this ratio compare with the expected phylogenetic diversity that will be lost? Although the latter ratio (for PD) has been determined in earlier work, here we provide a corresponding result for FD, and show how it differs from PD when the rate of feature loss is non-zero. Our results suggest that the relative loss of FD under a mass extinction event at the present is greater than the relative loss of PD. We also investigate the number of features that are expected to be found in just one species at the present.

We begin by considering the properties of FD in a more general setting based purely on the species themselves (i.e. not involving any underlying phylogenetic tree or network) and then consider FD on fixed phylogenetic trees before presenting the results for (random) birth–death trees. Some of the results of these earlier sections are applied in the later sections.

2 General properties of feature diversity without reference to phylogenies

This section considers the generic properties of expected FD for sets of species, and thus, no underlying phylogenetic tree or stochastic process that generates a tree, or model of feature evolution is assumed. We mostly follow the notation from Wicke et al. (2021).

Definitions

Let X be a labelled set of species, and for each \(x \in X\), let \(F_x\) be a non-empty set of discrete features (e.g. genes, genomic inserts, traits) that are associated with species x.

  • The collection of ordered pairs \({\mathbb {F}}=\{(x,F_x):x\in X\}\) is called a feature assignment on X.

  • For any subset A of X, let \({\mathcal {F}}(A)=\bigcup _{x\in A}F_x\) and let \(\mu :{\mathcal {F}}(X)\rightarrow {\mathbb {R}}^{>0}\) be a function assigning some richness or novelty to a feature \(f\in {\mathcal {F}}(X)\).

  • The feature diversity of some subset A of a set of species X is defined as,

    $$\begin{aligned} FD(A)=\sum _{f\in {\mathcal {F}}(A)}\mu (f). \end{aligned}$$
    (2.1)

A default option for \(\mu \) is to set \(\mu (f)=1\), which simply counts the number of features present. Notice that for any subset A of X we have \(FD(A)\le \sum _{x\in A}FD(\{x\}) = \sum _{x\in A}\sum _{f\in F_x}\mu (f),\) with equality if and only if no features are shared by any two species.

2.1 Feature diversity loss under a ‘Field of Bullets’ model of extinction at the present

Definition 1.1

Consider a sudden extinction event taking place across a set of species (Raup 1993). In the generalised field of bullets (g-FOB) model, each species \(x\in X\) either survives the extinction event (with probability \(s_x\)) or disappears (with probability \(1-s_x\)), and these survival events are assumed to be independent among the species. We write \(\textbf{s}(X)\) (or, more briefly, \(\textbf{s}\)) to denote the vector \((s_x: x\in X)\) and we let \({\mathcal {X}}\) denote the (random) subset of X corresponding to the species that survive the extinction event. If \(s_x=s\) for all \(x\in X\), then we have the simpler field of bullets (FOB) model.

We define the following quantity:

$$\begin{aligned} \varphi _{({\mathbb {F}},\textbf{s})} = \frac{FD({\mathcal {X}})}{FD(X)}, \end{aligned}$$

which is the proportion of feature diversity that survives the extinction event. In the case of the FOB model, we denote this ratio by \(\varphi _{({\mathbb {F}}, s)}\).

Proposition 1.1

For the g-FOB model,

$$\begin{aligned} {\mathbb {E}}[\varphi _{({\mathbb {F}},\textbf{s})}] = \sum _{f \in {\mathcal {F}}(X)} {\tilde{\mu }}(f)\left( 1-\prod _{x: f \in F_x}(1-s_x)\right) , \end{aligned}$$

where \({\tilde{\mu }}(f) = \frac{\mu (f)}{\sum _{f \in {\mathcal {F}}(X)} \mu (f)}\) are the normalised \(\mu \) values (which sum to 1).

For the FOB model, the equation in Proposition 2.1 simplifies to

$$\begin{aligned} {\mathbb {E}}[\varphi _{({\mathbb {F}}, s)}] = \sum _{f \in {\mathcal {F}}(X)} {\tilde{\mu }}(f)\left( 1-(1-s)^{n_f}\right) , \end{aligned}$$
(2.2)

where \(n_f=|\{f: \exists x: f \in F_x\}|\), the number of species in X that possess feature f.

Proof

We have \(\varphi _{({\mathbb {F}},\textbf{s})} =\sum _{f \in {\mathcal {F}}(X)} {\tilde{\mu }}(f) \cdot {\mathbb {I}}_f\) where \({\mathbb {I}}_f\) is the Bernoulli variable that takes the value 1 if at least one species in X with feature f survives the g-FOB extinction event, and 0 otherwise. Applying linearity of expectation and noting that \({\mathbb {P}}({\mathbb {I}}_f = 1) = 1-\prod _{x: f \in F_x}(1-s_x)\) gives the result. \(\square \)

Notice that \({\mathbb {E}}[\varphi _{({\mathbb {F}}, s)}]= s\) at \(s=0,1\). The behaviour between these two extreme values of s is described next.

Proposition 1.2

Under the FOB model, the following hold:

  1. (i)

    \({\mathbb {E}}[\varphi _{({\mathbb {F}}, s)}] \ge s\) for all \(s\in [0,1]\).

  2. (ii)

    \({\mathbb {E}}[\varphi _{({\mathbb {F}}, s)}]= s\) for a value of \(s \in (0,1)\) if and only if the sets \(F_x\) in \({\mathbb {F}}\) are pairwise disjoint, in which case \({\mathbb {E}}[\varphi _{({\mathbb {F}}, s)}]= s\) for all \(s \in [0,1]\).

  3. (iii)

    If the sets \(F_x\) are not pairwise disjoint, then \({\mathbb {E}}[\varphi _{({\mathbb {F}}, s)}]\) is a strictly concave increasing function of s.

Proof

Part (i): Since \(n_f \ge 1\) in Eq. (2.2) and \(1-(1-s)^{n_f} \ge s\) for all \(n_f \ge 1\), the claimed inequality is immediate. Part (ii): If \(n_f =1\) then \(1-(1-s)^{n_f} =s\) for all \(s \in [0,1]\), and if \(n_f>1\) then \(1-(1-s)^{n_f} >s\) for every \(s \in (0,1)\). Part (iii): By Eq. (2.2),

$$\begin{aligned} \frac{d}{ds} {\mathbb {E}}[\varphi _{({\mathbb {F}}, s)}] = \sum _{f \in {\mathcal {F}}(X)} {\tilde{\mu }}(f)n_f (1-s)^{n_f-1}> 0, \end{aligned}$$

and

$$\begin{aligned} \frac{d^2}{ds^2} {\mathbb {E}}[\varphi _{({\mathbb {F}}, s)}] = -\sum _{f \in {\mathcal {F}}(X): n_f \ge 2} {\tilde{\mu }}(f)n_f(n_f-1)(1-s)^{n_f-2} <0, \end{aligned}$$

for all \(s \in (0,1)\), from which the claimed results follow. \(\square \)

2.2 Approximating \(\varphi _{({\mathbb {F}}, s)}\) by its expected value

For each \(n\ge 1\), let \(X_n\) be a labelled set of n species with feature assignment \({\mathbb {F}}_n\). Let \({\mathcal {X}}_n\) denote the (random) set of species after a g-FOB extinction event with survival probability vector \(\textbf{s}(X_n)\), which assigns each \(x\in X_n\) a corresponding survival probability \(s_n(x)\). Note that we make no assumption regarding how the species in \(X_n\) and \(X_{m}\) are related (e.g. they may be disjoint, overlapping or nested), or any apriori relationship between \(\textbf{s}(X_n)\) and \(\textbf{s}(X_m)\).

In the following result, we provide a sufficient condition under which \(\varphi _{({\mathbb {F}}_n, \textbf{s}_n)}\) is likely to be close to its expected value \({\mathbb {E}}[\varphi _{({\mathbb {F}}_n, \textbf{s}_n)}]\) (readily computed via Eq. (2.1)) when the number of species n is large. This condition allows some species to contribute proportionately more FD than other species do on average, and this proportion can grow with n, provided that it does not grow too quickly.

Proposition 1.3

  1. (a)

    For \(\epsilon >0\),

    $$\begin{aligned} {\mathbb {P}}\left( \left| \varphi _{({\mathbb {F}}_n, \textbf{s}_n)} - {\mathbb {E}}[\varphi _{({\mathbb {F}}_n, \textbf{s}_n)}] \right| > \epsilon \right) \le 2\exp \left( -\frac{2\epsilon ^2}{R_n}\right) , \end{aligned}$$

    where \(R_n = \sum _{x \in X_n} \left[ \frac{FD(\{x\})}{FD(X_n)}\right] ^2\) .

  2. (b)

    If \(R_n \rightarrow 0\) as \(n\rightarrow \infty \) then \(\varphi _{({\mathbb {F}}_n, \textbf{s}_n)} - {\mathbb {E}}[\varphi _{({\mathbb {F}}_n, \textbf{s}_n)}] \xrightarrow { P }0\).

  3. (c)

    Let \(\textrm{av}FD(X_n) = FD(X_n)/n\) (the average contribution of each species to the total FD), and suppose that for each \(x \in X_n\), \(FD(\{x\}) /\textrm{av}FD(X_n) \le B_n\), where and \(B_n^2 = o(n)\) (e.g. \(B_n = \root 3 \of {n})\). We then have the following convergence in probability as \(n \rightarrow \infty \):

    $$\begin{aligned} \varphi _{({\mathbb {F}}_n, \textbf{s}_n)} - {\mathbb {E}}[\varphi _{({\mathbb {F}}_n, \textbf{s}_n)}] \xrightarrow { P }0. \end{aligned}$$

Proof

Let \(\mathbf{Y_n}=\{Y_i:i\in [n]\}\) be a sequence of Bernoulli random variables where each \(Y_i\) takes the value of 1 if species \(x_i\) survives and 0 otherwise. For the g-FOB model, the random variables \(Y_i\) are independent. We can write \(\varphi _{({\mathbb {F}}_n, \textbf{s}_n)} = h(\mathbf{Y_n})\) where \(h(y_1, \ldots , y_n)\) is the ratio \(\frac{FD(\{x_i \in X_n: y_i=1\})}{FD(X_n)}.\) Observe that for any particular value of \(i \in \{1, \ldots , n\}\), if we change \(y_i\) (from 0 to 1 or visa versa) to give \(y_i'\) then:

$$\begin{aligned} |h(y_1,...,y_i,...,y_n)-h(y_1,...,y_i',...,y_n)|\le \frac{FD(\{x_i\})}{FD(X_n)}. \end{aligned}$$

We now apply McDiarmid’s inequality McDiarmid (1989) to obtain (for each \(\epsilon >0\)):

$$\begin{aligned} {\mathbb {P}}(\left| \varphi _{({\mathbb {F}}_n, \textbf{s}_n)}-{\mathbb {E}}[\varphi _{({\mathbb {F}}_n, \textbf{s}_n)}]\right| \ge \epsilon ) \le 2\exp \left( \frac{-2\epsilon ^2}{R_n}\right) , \end{aligned}$$
(2.3)

This establishes Part (a). Part (b) now follows immediately, and Part (c) follows from Part (b), since the condition in Part (c) implies that

$$\begin{aligned} R_n = \sum _{x} \left[ \frac{FD(\{x\})}{n \cdot \textrm{av}FD(X_n)}\right] ^2 \le \sum _x\left( \frac{B_n}{n}\right) ^2 = \sum _x \frac{B_n^2}{n^2} = B_n^2/n \rightarrow 0, \end{aligned}$$

as \(n \rightarrow \infty \). \(\square \)

Remark

Note that Proposition 2.3(c) can fail when the condition stated in Part (c) does not hold, even for the simpler FOB model. We provide a simple example to demonstrate this. Let \(X_n = \{1, \ldots , n\}\) and let \(F_i =\{f\}\) for \(i=1, \ldots , n-1\) and \(F_n = \{g\}\) where f and g are distinct features with \(\mu (f)=\mu (g)=1\). In this case:

$$\begin{aligned} \varphi _{({\mathbb {F}}_n,s)}= {\left\{ \begin{array}{ll} 0, &{} \text{ w.p. } (1-s)^n;\\ \frac{1}{2}, &{} \text{ w.p. } (1-s)^{n-1}s + (1-(1-s)^{n-1})(1-s);\\ 1, &{} \text{ w.p. } (1-(1-s)^{n-1}) s. \end{array}\right. } \end{aligned}$$

Therefore, \(\varphi _{({\mathbb {F}}_n, s)} - {\mathbb {E}}[\varphi _{({\mathbb {F}}_n, s)}]\) does not converge in probability to 0 (or to any constant) as \(n\rightarrow \infty \).

2.3 Consequences for phylogenetic diversity

Proposition 2.2 and Proposition 2.3 provide a simple way to derive certain results concerning phylogenetic diversity - both on rooted trees and also for rooted phylogenetic networks (specifically for the subNet diversity measure described in Wicke and Fischer (2018)). To each edge e of a rooted phylogenetic tree (or network), associate some unique feature \(f_e\) and give it the value \(\mu (f_e)=\ell (e)\), where \(\ell (e)\) is the length of edge e in the tree (or network). For any subset Y of X (the leaf set of T) we then have:

$$\begin{aligned} PD(Y) = FD(Y). \end{aligned}$$

It follows that for the simple field of bullets models, PD satisfies the concavity properties described in Proposition 2.2, where the condition that the sets \(F_x\) are pairwise disjoint corresponds to the tree being a star tree. These results were established for rooted trees by specific tree-based arguments (see e.g. Sect. 5 of Lambert and Steel (2013)), but they directly follow from the more general framework above, and extend beyond trees.

Using this same link between PD and FD, Proposition 2.3 provides a further application to any sequence of rooted phylogenetic trees \(T_n\) with n leaves and ultrametric edge lengths. For example, the ratio of surviving PD to original PD under the FOB model converges in probability to the expected value of this ratio for a sequence of trees \(T_n\) if the total PD of \(T_n\) grows at least as fast as \(nL/n^\beta \), where L is the height of the tree and \(0<\beta <1/2\). This condition holds, for example, for Yule trees (Stadler and Steel 2012).

3 The feature diversity ratio \(\varphi \) for a model of feature evolution on a phylogenetic tree

Consider a rooted binary phylogenetic tree \(T_n\), in which each edge has a positive length that corresponds to a temporal duration, with the root \(\rho \) of \(T_n\) being placed at the top of a stem edge at time 0, and with each leaf in the leaf set \(X_n=\{x_1, \ldots , x_n\}\) of \(T_n\) being placed at time t (as in Fig. 1). For convenience, we will assume in this section that \(\mu (f)=1\) for all features; however, this assumption can be relaxed (e.g. by allowing \(\mu (f)\) to take values in a fixed interval [ab] where \(a>0\) according to some fixed distribution, and independently between features) without altering the results significantly.

We let \(F_\rho \) denote the (possibly empty) set of features present at time 0 (i.e. at the top of the stem edge), and we assume throughout this section that \(|F_\rho |\) is bounded by some fixed constant B, independent of n.

On \(T_n\), we apply a stochastic process in which (discrete) features arise independently along the branches of this tree at rate r, and each feature that arises is novel (i.e. it has not appeared earlier elsewhere in the tree). Once a feature arises, it is then carried forward in time along the branches of \(T_n\) (and is passed on to the two lineages arising at any speciation event). In addition, any feature can be lost from a lineage at any point according to a continuous-time pure-death process that operates at rate \(\nu \). This model was investigated in a different setting in Huson and Steel (2004) and studied more recently in Rosindell et al. (2022). Under this process, each leaf x of \(T_n\) will have a (possibly empty) set of features (\(F_x\)). Fig. 1 illustrates the processes described. Note that \(|F_x|\) (for any \(x \in X_n\)) and \(FD(X_n)\) are now random variables.

Fig. 1
figure 1

An example of feature gain and loss on a tree and the impact of extinction at the present. Left: New features arise (indicated by \(+\)) and disappear (indicated by −) along the branches of the tree. To simplify this example, no features are present at time 0; however, our results do not require this restriction. In total, there are five features present amongst the leaves of the tree at time t, namely \(\{\alpha ,\beta ,\gamma ,\delta ,\epsilon \}\). Right: An extinction event at the present (denoted by \(\dagger \)) results in the loss of three of the extant species, leaving just three features being present among the leaves of the resulting pruned tree, namely \(\{\alpha ,\beta ,\gamma \}\). Thus, the ratio of surviving features to total features is \(3/5= 0.6\)

Let \(N_\ell \) denote the number of features at the end of any path P in \(T_n\) that starts at time \(t=0\) and ends at time \(\ell \). Then \(N_\ell \) is described by a continuous-time Markov process that has a constant birth rate and a linear death rate. It is then a classical result Feller (1950) that \(N_\ell \) has expected value \(\frac{r}{\nu }(1-e^{-\nu \ell })+F_0 e^{-\nu \ell }\), where \(F_0=|F_\rho |\) (the number of features present at time 0), and \(N_\ell \) converges to a Poisson distribution with mean \(\frac{r}{\nu }\) as \(\ell \) grows. Moreover, if \(N_0=0\) then \(N_\ell \) has a Poisson distribution for any value of \(\ell >0\) (Feller (1950) p. 461), and so, regardless of the value of \(F_\rho \), the random variable \(|F_x\setminus F_\rho |\) has a Poisson distribution with mean \(\frac{r}{\nu }(1-e^{-\nu \ell (x)})\) where \(\ell (x)\) is the length of the path from the root to leaf x. The (random) number of features at any leaf x of \(T_n\) (i.e. \(|F_x|\)) has the same distribution as \(N_{\ell (x)}\). Note that for distinct leaves x and y of \(T_n\), the random variables \(|F_x|\) and \(|F_y|\) are not independent.

Notational convention: Henceforth we will write \(FD(T_n)\) in place of \(FD(X_n)\) and we will also write \(FD(T_n^s)\) in place of \(FD({\mathcal {X}}_n)\) when \({\mathcal {X}}_n\) is the subset of the set of leaves \(X_n\) of \(T_n\) that survive under a FOB model with a survival probability s. We will let \(F_\rho \) denote the set of features present at the root vertex \(\rho \) at the top of the stem edge.

Lemma 1.1

Set \(F_0 =\emptyset \). Then for any value of \(n\ge 1\) the following hold:

  1. (i)

    The random variable \(FD(T_n)\) has a Poisson distribution.

  2. (ii)

    The expected value of \(FD(T_n)\) satisfies the following bound:

    $$\begin{aligned} {\mathbb {E}}[FD(T_n)] \le \frac{r}{\nu } n. \end{aligned}$$

Proof

Part (i): We use induction on n. Since \(F_\rho = \emptyset \), the result for base case (\(n=1\)) holds by the results mentioned in the previous paragraph. Thus, suppose that \(n\ge 2\), and let \(T_n\) be a binary tree with n leaves and with \(F_\rho = \emptyset \). Then:

$$\begin{aligned} FD(T_{n})=FD(T^1)+FD(T^2)+G. \end{aligned}$$

where the trees \(T^i\) (with \(i=1,2\)) are subtrees of \(T_n\) obtained by deleting the stem edge and its endpoints (and setting the set of features at the top of the stem edge of \(T^1\) and \(T^2\) equal to the empty set), and G is the number of features that arise on the stem edge of \(T_n\) and are present in at least one leaf of \(T_n\).

Notice that \(FD(T^1)\), \(FD(T^2)\) and G are independent random variables, and, by induction, \(FD(T^1)\) and \(FD(T^2)\) each have a Poisson distribution. Conditional on the number X of features that are present at the end of the stem edge, G has a binomial distribution with parameters X and p where p is the probability that a single feature present at the end of the stem edge is present in at least one leaf of \(T_n\). Since X has a Poisson distribution, and a Poisson number of Bernoulli random variables is Poisson, it follows that \(FD(T_n)\), being the sum of three independent Poisson variables, also has a Poisson distribution. This establishes the induction step and thus Part (i).

Part (ii): We have:

$$\begin{aligned} FD(T_n) = \left| \bigcup _{x\in X_n}F_x\right| \le \sum _{x \in X_n}|F_x|. \end{aligned}$$

Now, for each \(x \in X\), we have \({\mathbb {E}}[|F_x|] = \frac{r}{\nu }(1-e^{-\nu L_x})\) where \(L_x\) denotes the length of the path from the top of the stem edge of \(T_n\) to the leaf x. Thus \({\mathbb {E}}[FD(T_n)] \le \frac{r}{\nu }\cdot n.\) \(\square \)

Example (\(n=2\)) Consider the process described on \(T_2\) with \(F_0=\emptyset \). Let \(\ell _0\) denote the length of the stem edge, and \(\ell \) the length of each of the two pendant edges. Then \(FD(T_2)\) has a Poisson distribution with expected value

$$\begin{aligned} \frac{r}{\nu }(1-e^{-\nu \ell _0})(1-(1-e^{-\nu \ell })^2)+ 2\frac{r}{\nu }(1-e^{-\nu \ell }) \end{aligned}$$

and \(FD(T_2^s)\) has expected value

$$\begin{aligned} s^2 FD(T_2) + 2s(1-s) \left[ \frac{r}{\nu }(1-e^{-\nu \ell _0})e^{-\nu \ell } + \frac{r}{\nu }(1-e^{-\nu \ell })\right] . \end{aligned}$$

In particular,

$$\begin{aligned} \frac{{\mathbb {E}}[FD(T_2^s)]}{{\mathbb {E}}[FD(T_2)]} = s \cdot \left[ s + 2(1-s) \frac{1-e^{-\nu (\ell _0+\ell )}}{(1-e^{-\nu \ell _0})(1-(1-e^{-\nu \ell })^2)+ 2(1-e^{-\nu \ell })} \right] \end{aligned}$$

When \(\ell _0=0\), the right-hand side of this equation equals s, but for all other values it is strictly greater than s. Moreover, by differentiating \(\frac{{\mathbb {E}}[FD(T_2^s)]}{{\mathbb {E}}[FD(T_2)]}\) it can be verified that this ratio is monotone decreasing as \(\nu \) increases for all positive values of \(\ell \) and \(\ell _0\); in particular, for \(s \in (0,1)\), and \(\nu >0\), this ratio is always less than the expected proportion of PD that survives in \(T_2\).

3.1 A limit result for sequences of trees

The main result of this section is Theorem 3.1, and its proof relies on establishing a sequence of preliminary lemmas.

Lemma 1.2

Let \(\beta >0\) be a fixed constant. Given a sequence \(T_n\) of trees with leaf set \(X_n\), let \({\mathcal {E}}_n\) be the event that \(|F_x| \le n^\beta \) for every \(x \in X_n\). Then \(\lim _{n \rightarrow \infty } {\mathbb {P}}({\mathcal {E}}_n) = 1\).

Proof

We combine the Bonferroni inequality with a standard right-tail probability bound for a Poisson variable. Firstly, observe that:

$$\begin{aligned} {\mathbb {P}}({\mathcal {E}}_n) \ge 1- \sum _{x \in X_n}{\mathbb {P}}(|F_x|> n^\beta ) = 1 - n{\mathbb {P}}(|F_{x_1}| > n^\beta ). \end{aligned}$$
(3.1)

Now, \(|F_{x_1}|\) can be written as the sum of two independent random variables \(U_{x_1}+V_{x_1}\) where \(U_{x_1}\) has a Poisson distribution with mean \(m \le r/\nu \), and \(V_{x_1}\le |F_\rho | \le B\) with probability 1 (\(U_{x_1}\) counts the features at \(x_1\) if \(F_\rho = \emptyset \), \(V_{x_1}\) counts the features in \(F_\rho \) that are remain present at \(x_1\), and B is the global bound on \(F_\rho \) described near the start of Sect. 3). Thus, the Chernoff bound on the right hand tail of a Poisson variable (Mitzenmacher and Upfal 2005, p. 97) gives \({\mathbb {P}}(|U_{x_1}| >n^\beta ) \le \left( \frac{em}{n^\beta }\right) ^{n^\beta } e^{-m},\) and so \(n{\mathbb {P}}(|F_{x_1}| >n^\beta ) \rightarrow 0\) as n grows. Applying this to the inequality in (3.1) establishes the result. \(\square \)

Lemma 1.3

Let \((T_n, n\ge 1)\) be a sequence of rooted binary trees, with \(T_n\) having leaf set \(X_n\), and suppose that \({\mathbb {E}}[FD(T_n)] \ge cn\) for some constant \(c>0\). Then \(\frac{FD(T_n)}{{\mathbb {E}}[FD(T_n)]}\) converges in probability to 1 as \(n\rightarrow \infty \).

Proof

First observe that we can write \(FD(T_n)\) as the sum of two independent random variables, namely \(FD_0(T_n) +K\), where \(FD_0(T_n)\) is the FD value when \(F_\rho =\emptyset \), and K is the number of features at the time at the root that are also present in at least one leaf. In particular, \(K \le |F_\rho |\) which is assumed to be bounded by a constant B (independent of n). Thus

$$\begin{aligned} \textrm{Var}[FD(T_n)] = \textrm{Var}[FD_0(T_n)] +\textrm{Var}[K] \le \textrm{Var}[FD_0(T_n)] + B. \end{aligned}$$

By Lemma 3.1, \(\textrm{Var}[FD_0(T_n)] = {\mathbb {E}}[FD_0(T_n)]\le \frac{r}{\nu } \cdot n\), so \(\textrm{Var}[FD(T_n)] \le \frac{r}{\nu } \cdot n +B\), and thus:

$$\begin{aligned} \textrm{Var}\left[ \frac{FD(T_n)}{n}\right] =\textrm{Var}[FD(T_n)]/n^2 \rightarrow 0, \end{aligned}$$
(3.2)

as \(n \rightarrow \infty \). We now apply Chebyshev’s inequality to obtain:

$$\begin{aligned} {\mathbb {P}}\left( \left| \frac{FD(T_n)}{{\mathbb {E}}[FD(T_n)]}-1\right| \ge \epsilon \right) \le \textrm{Var}\left( \frac{FD(T_n)}{{\mathbb {E}}[FD(T_n)]}\right) \big /\epsilon ^2, \end{aligned}$$
(3.3)

and since

$$\begin{aligned} \frac{FD(T_n)}{{\mathbb {E}}[FD(T_n)]} = \frac{FD(T_n)}{n} \cdot \frac{n}{{\mathbb {E}}[FD(T_n)]} \end{aligned}$$

we have:

$$\begin{aligned} \textrm{Var}\left[ \frac{FD(T_n)}{{\mathbb {E}}[FD(T_n)]}\right] = \textrm{Var}\left[ \frac{FD(T_n)}{n}\right] \cdot \left( \frac{n}{{\mathbb {E}}[FD(T_n)]} \right) ^2 \end{aligned}$$

and so the term on the right of Eq. (3.3) is bounded above by \(\textrm{Var}\left[ \frac{FD(T_n)}{n}\right] \cdot \frac{1}{\epsilon ^2c^2}\), which converges to 0 as \(n \rightarrow \infty \) by Eq. (3.2). \(\square \)

Lemma 1.4

Let \((T_n, n\ge 1)\) be a sequence of rooted binary trees, with \(T_n\) having leaf set \(X_n\), and suppose that \({\mathbb {E}}[FD(T_n)] \ge cn\) for some constant \(c>0\). Suppose that for each \(x\in X_n\), \(FD(\{x\})\le B_n\), where \(B_n^2 = o(n)\). Then \(\frac{FD(T^s_n)}{{\mathbb {E}}[FD(T^s_n)]} \xrightarrow {P} 1\) as \(n \rightarrow \infty \).

Proof

Let \(W_n = H(\mathbf{Y_n})\), where \(\mathbf{Y_n}=\{Y_i:i\in [n]\}\) is the sequence of Bernoulli random variables with \(Y_i=1\) if species \(x_i\) survives the FOB extinction event and \(Y_i=0\) otherwise, and \(H(y_1, \ldots , y_n)\) is the ratio \(\frac{FD(\{x_i \in X_n: y_i=1\})}{{\mathbb {E}}[FD(T^s_n)]}\), where the numerator is as defined in Eq. (2.1) with \(\mu (f)=1\) for all f. Observe that for any particular value of \(i \in \{1, \ldots , n\}\), if we change \(y_i\) (from 0 to 1 or visa versa) to give \(y_i'\) then:

$$\begin{aligned} |H(y_1,...,y_i,...,y_n)-H(y_1,...,y_i',...,y_n)|\le & {} FD(\{x_i\})/{\mathbb {E}}[FD(T^s_n)] \\\le & {} B_n/{\mathbb {E}}[FD(T^s_n)]. \end{aligned}$$

We now apply McDiarmid’s inequality to obtain (for each \(\epsilon >0\)):

$$\begin{aligned} {\mathbb {P}}(|W_n-1| \ge \epsilon ) \le 2\exp \left( \frac{-2\epsilon ^2{\mathbb {E}}[FD(T_n^s)]^2}{nB_n^2}\right) . \end{aligned}$$
(3.4)

Now, \({\mathbb {E}}[FD(T_n^s)] \ge s\cdot {\mathbb {E}}[FD(T_n)]\) by Proposition 2.2(i) (taking \(\mu (f)=1\) for all f) and, by assumption, \({\mathbb {E}}[FD(T_n)] \ge cn\). Thus we obtain:

$$\begin{aligned} {\mathbb {P}}(|W_n- 1| \ge \epsilon ) \le 2\exp \left( \frac{-2\epsilon ^2c^2s^2n}{B_n^2}\right) . \end{aligned}$$

Therefore, \({\mathbb {P}}(|W_n- 1| \ge \epsilon ) \rightarrow 0\) as \(n\rightarrow \infty \). Since this holds for all \(\epsilon >0\), we obtain the claimed result. \(\square \)

We can now state the main result of this section. Recall that \(FD(T_n^s)\) is the number of features present among leaves of \(T_n\) that survive the FOB extinction event.

Theorem 1.1

Let \((T_n, n\ge 1)\) be a sequence of binary trees and let features evolve on \(T_n\) according to the stochastic feature evolution process described. If \({\mathbb {E}}[FD(T_n)] \ge cn\) for some constant \(c>0\), and if \( \frac{{\mathbb {E}}[FD(T_n^s)]}{{\mathbb {E}}[FD(T_n)]}\) converges to a constant \(c_s\) as n grows we have:

$$\begin{aligned} \frac{FD(T_n^s)}{FD(T_n)} \xrightarrow {P} c_s \end{aligned}$$

as \(n \rightarrow \infty \).

Before we proceed to the proof, we provide the following comments.

Fig. 2
figure 2

For the tree \(T_n\) shown (with \(\ell >1\) fixed and \(s \in (0,1)\)) the sequence of trees \((T_n, n\ge 2)\) has the property that \(\frac{FD(T_n^s)}{FD(T_n)}\) does not converge in probability to any constant value as n grows

Remarks

  • Theorem 3.1 can fail without the condition \({\mathbb {E}}[FD(T_n)] \ge cn\). Fig. 2 provides a simple example of a sequence of trees \(T_n\) for which \(\frac{FD(T_n^s)}{FD(T_n)}\) does not converge in probability to any constant value as n grows. In this example, \(F_\rho =\emptyset \) and the tree has height \(2\ell \) with one leaf having an incident edge of length \(\ell \) and the remaining \(n-1\) leaves having incident edges of length 1/n.

  • A sufficient condition for the inequality \({\mathbb {E}}[FD(T_n)] \ge cn\) in Theorem 3.1 to hold is that for some \(\epsilon >0\) and \(\delta >0\) the proportion of pendant edges of \(T_n\) of length \(\ge \epsilon \) is at least \(\delta \) for all \(n \ge 1\). Briefly, the reason for this is that the expected number of features arising on each of these pendant edges and surviving to the end of this edge is bounded away from 0 and the features associated with distinct pendant edges are always different from each other, and different from any other features arising in the tree.

Proof of Theorem 3.1

Let \(A_\epsilon (n)\) be the event that \(\left| \frac{FD(T^s_n)}{{\mathbb {E}}[FD(T^s_n)]} -1 \right| < \epsilon \), and let \({\mathcal {E}}_n\) be the event described in Lemma  3.2 with \(\beta = \frac{1}{3}\). Then, \(\lim _{n \rightarrow \infty }{\mathbb {P}}(A_\epsilon (n) | {\mathcal {E}}_n) =1\), by Lemma 3.4. Now,

$$\begin{aligned} {\mathbb {P}}(A_\epsilon (n)) = {\mathbb {P}}(A_\epsilon (n) | {\mathcal {E}}_n){\mathbb {P}}({\mathcal {E}}_n) + {\mathbb {P}}(A_\epsilon (n)|\overline{{\mathcal {E}}_n}) {\mathbb {P}}(\overline{{\mathcal {E}}_n}), \end{aligned}$$

and since \({\mathbb {P}}({\mathcal {E}}_n ) \rightarrow 1\) as \(n\rightarrow \infty \) (by Lemma 3.2) we have \(\lim _{n \rightarrow \infty } {\mathbb {P}}(A_\epsilon (n)) =1\). Since this holds for all \(\epsilon >0\), it follows that \(\frac{FD(T^s_n)}{{\mathbb {E}}[FD(T^s_n)]} \xrightarrow {P} 1\) as n grows.

Moreover, from Lemma 3.3, we have \(\frac{FD(T_n)}{{\mathbb {E}}[FD(T_n)]} \xrightarrow {P} 1.\) Now we can write \(\frac{FD(T_n^s)}{FD(T_n)}\) as follows:

$$\begin{aligned} \frac{FD(T_n^s)}{FD(T_n)} = \frac{FD(T_n^s)}{{\mathbb {E}}[FD(T^s_n)]} \cdot \frac{{\mathbb {E}}[FD(T_n)]}{FD(T_n)} \cdot \frac{{\mathbb {E}}[FD(T^s_n)]}{{\mathbb {E}}[FD(T_n)]} \end{aligned}$$

The first two terms on the right of this equation each converge in probability to 1 as \(n\rightarrow \infty \), whereas the third (deterministic) term converges to \(c_s\). This completes the proof. \(\square \)

We will apply Theorem 3.1 in the next section to establish a result for FD loss on birth–death trees.

4 Feature diversity ratios in birth–death trees

In this section, we continue to investigate the stochastic model of feature gain and loss, but rather than considering fixed trees, we will now allow the trees themselves to be stochastically generated, following the simple birth–death processes that are common in phylogenetics. Thus, there will now be three stochastic processes in play: the linear-birth/linear-death process that generates the tree, the constant-birth/linear-death process of feature gain and loss operating along the branches of the tree, and the simple FOB extinction event at the present.

4.1 Definitions

Let \({\mathcal {T}}_t\) denote a birth–death tree grown from a single lineage for time t with birth and death parameters \(\lambda \) and \(\mu \), respectively. We will assume throughout that \(\lambda >\mu \) (since otherwise the tree is guaranteed to die out as t becomes large).

On \({\mathcal {T}}_t\), we impose the model of feature gain and loss from the previous section with parameters r and \(\nu \). We now apply the FOB model in which each extant species (i.e. leaves of \({\mathcal {T}}_t\) that are present at time t) has a probability \(s>0\) of surviving and \(1-s\) of becoming extinct (independently across the extant species), and we let \({\mathcal {T}}_t^s\) denote the tree obtained from \({\mathcal {T}}_t\) by removing the species at the present that did not survive this process. We refer to \({\mathcal {T}}_t^s\) as the pruned tree, and the leaves of \({\mathcal {T}}_t\) and \({\mathcal {T}}_t^s\) that are present at time t as the extant species (or leaves) of these trees (to contrast them from leaves of \({\mathcal {T}}_t\) that lie at the endpoints of any extinct lineages). If \(s<1\), there may now be fewer features present among the (probably reduced number of) extant species in \({\mathcal {T}}_t^s\) than there were in \({\mathcal {T}}_t\).

Let \(FD({\mathcal {T}}_t)\) be the discrete random variable that counts the number of features present in at least one of the extant leaves of \({\mathcal {T}}_t\), and let \(FD({\mathcal {T}}_t^s)\) denote the number of these features that are also present in at least one extant leaf in the pruned tree \({\mathcal {T}}_t^s\).

4.2 Expected feature diversity

Next, we consider the expected values of \(FD({\mathcal {T}}_t)\) and \(FD({\mathcal {T}}^s_t)\), where this expectation is across all three processes (the birth–death process that generates \({\mathcal {T}}_t\), feature gain and loss on this tree, and the species that survive the extinction event at the present under the FOB model). Of particular interest is the ratio of these expectations, and their limit as t becomes large. Specifically, let:

$$\begin{aligned} \varphi _{FD}(t, s) = \frac{{\mathbb {E}}[FD({\mathcal {T}}_t^s)]}{{\mathbb {E}}[FD({\mathcal {T}}_t)]} \text{ and } \varphi _{FD}(s) = \lim _{t\rightarrow \infty } \varphi _{FD}(t, s). \end{aligned}$$

Note that \(\varphi _{FD}(s)\) is a function of five parameters (\(s,r, \lambda , \mu , \nu \)); however, we will show that it is just a function of s and two other parameters. Notice also that once these parameters are fixed, \(\varphi _{FD}(t, s)\) and \(\varphi _{FD}(s)\) are monotone increasing functions of s taking the value 0 at \(s=0\) and 1 at \(s=1\). Moreover, \(\varphi _{FD}(t, s)\) and \(\varphi _{FD}(s)\) are both independent of r (the rate at which features arise along any lineage) as we formally show shortly.

4.3 Relationship to phylogenetic diversity (PD)

Recall that for a rooted phylogenetic tree T with branch lengths, the PD value of a subset S of leaves (PD(ST)) is the sum of the lengths of the edges of the subtree of T that connect S and the root of the tree.

In the special case where \(\nu =0\), and where no features are present at time 0, \(FD({\mathcal {T}}^s_t)\) (conditioned on \({\mathcal {T}}_t\)), has a Poisson distribution with a mean of r times \(PD({\mathcal {S}}_t, {\mathcal {T}}_t)\), where \({\mathcal {S}}_t\) is the (random) set of leaves at time t that survive the extinction event at the present. Consequently, \({\mathbb {E}}[FD({\mathcal {T}}_t^s)|{\mathcal {T}}_t] = {\mathbb {E}}[rPD({\mathcal {S}}_t, {\mathcal {T}}_t)]= r{\mathbb {E}}[PD({\mathcal {S}}_t, {\mathcal {T}}_t)]\). Similarly, \({\mathbb {E}}[FD({\mathcal {T}}_t)|{\mathcal {T}}_t] = r{\mathbb {E}}[PD({{\mathcal {L}}}_t, {\mathcal {T}}_t)]\), where \({{\mathcal {L}}}_t\) is the set of extant leaves of \({\mathcal {T}}_t\). Thus, in this special case we have \(\varphi _{FD}(s) = \varphi _{PD}(s)\), where:

$$\begin{aligned} \varphi _{PD}(s)=\lim _{t \rightarrow \infty } \frac{{\mathbb {E}}[PD({\mathcal {S}}_t, {\mathcal {T}}_t)]}{{\mathbb {E}}[PD({{\mathcal {L}}}_t, {\mathcal {T}}_t)]}. \end{aligned}$$

The function \(\varphi _{PD}(s)\) was explicitly determined in Lambert and Steel (2013), Mooers et al. (2012) as follows:

$$\begin{aligned} \varphi _{PD}(s): = {\left\{ \begin{array}{ll} \frac{\rho s}{\rho +s-1} \cdot \frac{\ln (s/(1-\rho ))}{\ln (1/(1-\rho ))}, &{} \text{ if } \rho = \mu /\lambda \ne 0, 1-s;\\ -s\ln (s)/(1-s), &{} \text{ if } \rho =0 \text{(i.e. } \text{ a } \text{ Yule } \text{ tree) };\\ (1-s)/\ln (1/s), &{} \text{ if } \rho =1-s. \end{array}\right. } \end{aligned}$$
(4.1)

5 Calculating \(\varphi _{FD}(s)\)

We first recall a standard result from birth–death theory. Consider a linear birth–death process (starting with a single individual at time 0), with a birth rate \(\lambda \), a death rate \(\theta \). For the individuals present at time t, sample each individual independently with sampling probability \(s>0\). Let \(X_t\) (\( t\ge 0\)) denote the number of these sampled individuals and let \(R_t^s(\lambda ,\theta )\) be the probability that \(X_t>0\). Then

$$\begin{aligned} R_t^s(\lambda ,\theta )={\left\{ \begin{array}{ll} \frac{s(\lambda -\theta )}{s\lambda +(\lambda (1-s)-\theta )e^{(\theta -\lambda )t}}, &{} \text{ if } \lambda \ne \theta ;\\ \frac{s}{1+\lambda s t}, &{} \text{ if } \lambda = \theta . \end{array}\right. } \end{aligned}$$
(5.1)

In particular, \(R_t^s(\lambda ,\theta )\) converges to 0 if \(\lambda \le \theta \) and converges to a strictly positive value \(1-\theta /\lambda \) if \(\lambda > \theta \) (Kendall 1948), (Yang and Rannala 1997).

The number of species at time t in \({\mathcal {T}}_t^s\) that have a copy of particular feature f that arose at some fixed time \(t_0 \in (0, t)\) in \({\mathcal {T}}_t\) is described exactly by the birth–death process \(X_{t-t_0}\) with parameters \(\lambda \) and \(\theta = \mu +\nu \) and survival probability s at the present; it follows that as t becomes large, it becomes increasingly certain that none of the species at time t in the pruned tree will contain feature f if \(\lambda \le \mu +\nu \), whereas if \(\lambda > \mu +\nu \), there is a positive limiting probability that f will be present in the extant leaves of the pruned tree.

Since \(\varphi _{FD}(s) =\varphi _{PD}(s)\) (as given by Eqn. (4.1)) for all values of s when \(\nu =0\), in this section we will assume that \(\nu >0\) (in addition to our universal assumption that \(\lambda > \mu \)). Our main result provides an explicit formula for \(\varphi _{FD}(s)\) in Part (a), and describes some of its key properties in Parts (b) and (c).

Theorem 1.2

Given \(\lambda >\mu \) and \(\nu >0\), let \(\rho =\frac{\mu +\nu }{\lambda }\), \(\beta =1- \frac{\lambda -\mu }{\nu }\). Then:

  1. (a)
    $$\begin{aligned} \varphi _{FD}(s) =s \cdot \frac{I(s)}{I(1)}, \end{aligned}$$

    where

    $$\begin{aligned} I(s) = \int _0^1 \frac{dx}{1 - s\left( \frac{1-x^\beta }{1-\rho }\right) }, \text{ when } \rho \ne 1 \end{aligned}$$

    and

    $$\begin{aligned} I(s) = \int _0^1 \frac{dx}{\nu /\lambda - s\ln x}, \text{ when } \rho =1. \end{aligned}$$
  2. (b)

    Conditional on the non-extinction of \({\mathcal {T}}_t\), \(\frac{FD({\mathcal {T}}_t^s)}{FD({\mathcal {T}}_t)}\) converges in probability to \(\varphi _{FD}(s)\) as \(t \rightarrow \infty \).

  3. (c)

    \(\varphi _{FD}(s)\) is an increasing concave function that satisfies \(1\ge \varphi _{FD}(s) \ge s\) for all s.

Remarks

  1. (i)

    Notice that although \(\varphi _{FD}(s)\) depends on five parameters (\(r,s,\lambda , \mu , \nu \)), Theorem 5.1(a) reveals that just three derived parameters suffice to determine \(\varphi _{FD}(s)\), namely s and the ratios \(\rho _1=\mu /\lambda \in (0,1)\) and \(\rho _2=\nu /\lambda \) (these determine \(\rho \) and \(\beta \), since \(\rho =\rho _1+\rho _2\) and \(\beta = 1-1/\rho _2 +\rho _1/\rho _2\)). Notice also that \(\rho =1 \Leftrightarrow \beta =0\) and \(\rho>1 \Leftrightarrow \beta >0\).

  2. (ii)

    Our proof relies on establishing the following exact expression for \({\mathbb {E}}[FD({\mathcal {T}}_t^s)]\):

    $$\begin{aligned} {\mathbb {E}}[FD({\mathcal {T}}_t^s)]=re^{(\lambda -\mu )t}\int _0^te^{-(\lambda -\mu )\tau }R_\tau ^s(\lambda ,\mu +\nu )d\tau + F_0 R_t^s(\lambda , \mu +\nu ).\nonumber \\ \end{aligned}$$
    (5.2)

    where \(F_0\) is the number of features present at time \(t=0\).

    In particular, this also provides an exact expression for \(\varphi _{FD}(t, s)\). Notice that the ratio of the expected number of features present in the pruned tree, divided by the expected number of species in the pruned tree is \(\frac{FD({\mathcal {T}}_t^s)}{se^{(\lambda -\mu )t}}\), and this ratio converges to \(\frac{r(1-\rho )}{\nu } \int _0^1 \frac{dx}{1-s-\rho + sx^\beta }\) as \(t \rightarrow \infty \) when \(\rho \ne 1\) (via a further analysis of Eq. (5.2) in the Appendix).

  3. (ii)

    We saw in Sect. 4.3 that when \(\nu =0\), \(\varphi _{FD}(s) = \varphi _{PD}(s)\). At the other extreme, if \(\lambda \) and \(\mu \) are fixed, and we let \(\nu \rightarrow \infty \), then \(\varphi _{FD}(s)\) converges to s (since \(\rho \rightarrow -\infty \) and \(\beta \rightarrow 1\) in Theorem 5.1(a)). Informally, when \(\nu \) is large compared to \(\lambda \), most of the features present among the extant leaves of \({\mathcal {T}}_t\) will have arisen near the end of the pendant edges incident with these extant leaves (a formalisation of this claim appears in Rosindell et al. (2022)); if we now apply the FOB model then the expected proportion of these features that survive will be close to the expected proportion of leaves that survive, namely s.

5.1 Illustrative examples

First, consider a Yule tree (i.e. \(\mu =0\)) grown for time t. Figure 3 (left) plots \(\varphi _{FD}(s)\) for values of \(\nu /\lambda \in \{0,0.5,1,2,10\}\). When \(\nu =0\), \(\varphi _{FD}(s)\) describes the proportional loss of expected PD in the pruned tree, and as \(\nu \) increases, \(\varphi _{FD}(s)\) converges towards s (the expected proportion of leaves that survive extinction at the present). Figure 3 (right) plots \(\varphi _{FD}(s)\) for birth–death trees with \(\mu /\lambda = 0.8\), showing a similar trend, however with \(\varphi _{FD}(s)\) ranging higher above the curve \(y=s\).

Fig. 3
figure 3

Left: The function \(\varphi _{FD}(s)\) as a function of s for pure-birth Yule trees (\(\mu =0\)). Right: The function \(\varphi _{FD}(s)\) for birth–death trees with \(\mu /\lambda = 0.8\). The curves within each graph show the effect of increasing \(\nu \), for values of \(\nu /\lambda \) between 0 (top) and 10 (bottom). The top curve also corresponds to the phylogenetic diversity ratio \(\varphi _{PD}(s)\)

5.2 Simulations

We ran simulations to test the expected relationship between \(\varphi _{FD}(s)\) and s and to get estimates of standard deviation. All simulations were run in R version 4.2.1 Team (2021). We first simulated 500 Yule trees (age = 100, \(\lambda = 0.055, \mu = 0\), repeated and filtered to keep 500 250-tip trees) and 500 birth–death trees (age = 100, \(\lambda = 0.11, \mu = 0.088\), repeated and filtered to keep 500 trees with 250-300 tips) using the sim.bd.age function in the package TreeSim (Stadler 2011). Features were then evolved on each tree, followed by separate extinction events.

Keeping the rate of feature gain fixed (\(r = 0.3\)), we modelled five different rates of feature loss (\(\nu \in \{0, 0.5\lambda , \lambda , 2\lambda , 10\lambda \}\)). We estimated feature gain and loss on each edge using the Gillespie algorithm (Gillespie 1976), where time until the next event (either feature gain or loss) was drawn from an exponential distribution with rate \((r + k \nu \)), where k is the number of currently existing features at the start of the edge. The type of event was then determined with a Bernoulli draw with probability of a gain equal to \(r / (r + k \nu )\). At each split on the tree, all existing features were copied to descendent edges. Each gain event created a new unique feature, and each loss event randomly selected an existing feature to eliminate from the current edge. At the end of the simulation the presence of features on each tip of the tree was recorded.

Extinction events were simulated by randomly selecting a proportion s of tips to delete, with s ranging from 0.05 to 0.95 by intervals of 0.05. The proportion of unique features remaining after extinction events was recorded, and the results are shown in Fig. 4.

Fig. 4
figure 4

Simulation results for remaining feature diversity as a function of s for pure-birth Yule trees (\(\mu = 0\), left) and birth–death trees (\(\mu /\lambda =0.8\), right). Results shown for two of the five feature evolution parameter scenarios. Each point represents one simulation on one simulated tree (\(n = 500\) trees). The solid curves are the theoretical relationships between the expected proportion of remaining feature diversity and s for both of the shown feature evolution parameter scenarios

Resulting ratios of remaining features generally tracked expectations (the bias for birth–death trees when \(\nu /\lambda =0\) is likely due to our theoretical results conditioning on t rather than n). The standard deviation (SD), calculated for each \(\nu \) on both Yule and birth–death trees, were fairly consistent for all \(\nu \) except \(\nu = 10\lambda \), where it was noticeably higher, as shown in Table 1. Because the number of total events along an edge (gains and losses) is described by a Poisson distribution, its variance increases with the mean, and this may explain the higher standard deviation at the highest loss rate.

Table 1 Mean standard deviations of proportions of remaining feature diversity (each value is averaged over the 19 values of s between 0.05 and 0.95)

5.3 Features that appear in only one extant species

Let \({\mathcal {U}}_t\) denote the number of features that are present in precisely one species in \({\mathcal {T}}_t\), and let \(U_t = {\mathbb {E}}[{\mathcal {U}}_t]\). The following result describes a simple relationship between \(U_t\) and \(F_t = {\mathbb {E}}[FD({\mathcal {T}}_t)]\).

Proposition 1.4

$$\begin{aligned} \frac{dF_t}{dt} = re^{(\lambda -\mu )t} - (\mu +\nu )U_t. \end{aligned}$$
(5.3)

Proof

Let \({\mathcal {F}}_t\) denote the number of features present at time t among the leaves of \({\mathcal {T}}_t\). Consider evolving \({\mathcal {T}}_t\) for an additional (short) period \(\delta \) into the future. Then, conditional on \({\mathcal {N}}_t\) (the number of leaves of \({\mathcal {T}}_t\) present at time t) and \({\mathcal {U}}_t\):

$$\begin{aligned} {\mathcal {F}}_{t+\delta } - {\mathcal {F}}_t = {\left\{ \begin{array}{ll} 1, &{} \text{ with } \text{ probability } r{\mathcal {N}}_t \delta +o(\delta );\\ -1, &{} \text{ with } \text{ probability } (\mu +\nu ) \delta \cdot {\mathcal {U}}_t + o(\delta );\\ 0, &{} \text{ with } \text{ probability } 1-((\mu +\nu ){\mathcal {U}}_t+r{\mathcal {N}}_t)\delta + o(\delta ). \end{array}\right. } \end{aligned}$$

Applying the expectation operator (and using \({\mathbb {E}}[{\mathcal {F}}_{t+\delta }] = {\mathbb {E}}[{\mathbb {E}}[{\mathcal {F}}_{t+\delta }|{\mathcal {N}}_t, {\mathcal {U}}_t]]\)) and letting \(\delta \rightarrow 0\) leads to the equation stated. \(\square \)

6 Concluding comments

In this paper, we have considered, in order, three types of data to quantify the expected loss of feature diversity: sets of features across species (without any model of feature evolution or phylogeny), sets of features at the tips of a given phylogenetic tree, and sets of features at the tips of a random (birth–death) tree. The results of the earlier sections also proved helpful in establishing certain results in later sections.

In terms of wider significance to biodiversity conservation, our results and graphs in Sect. 5 suggest that the extent of relative feature diversity loss following extinction at the present is likely to be greater than that predicted by relative phylogenetic diversity loss for any given extinction rate \(s \in (0,1)\).

Of course, our results are based on simple models (of feature gain and loss, and extinction at the present) and so exploring how these results might extend to more complex and realistic biological models would be a worthwhile topic for future work.