Analysis of diversity-dependent species evolution using concepts in population genetics

In this work, we consider a two-type species model with trait-dependent speciation, extinction and transition rates under an evolutionary time scale. The scaling approach and the diffusion approximation techniques which are widely used in mathematical population genetics provide modeling tools and conceptual background to assist in the study of species dynamics, and help exploring the analogy between trait-dependent species diversification and the evolution of allele frequencies in the population genetics setting. The analytical framework specified is then applied to models incorporating diversity-dependence, in order to infer effective results from processes in which the net diversification of species depends on the total number of species. In particular, the long term fate of a rare trait may be analyzed under a partly symmetric scenario, using a time-change transform technique.


Introduction
Models of species richness based on simple birth-death mechanisms with constant speciation and extinction rates suffer from the classical dichotomy of supercritical branching processes; species must either go extinct or grow in number without bound. Such models applied to characterizing divergence of living organisms in terms of number of species in taxas and families, consequently fail to produce what is typically Sylvain Glémin and Martin Lascoux were supported by the Marie Curie Intra-European Fellowship (Grant Number IEF-623486, project SELFADAPT).
B Ingemar Kaj ikaj@math.uu.se Extended author information available on the last page of the article observed. The situation is similar for diversification models of multi-trait species based on multi-type, linear branching process theory. To qualify as a species model, at least one trait should have a strictly positive net growth rate, which again leads to the same supercritical dichotomy as for a single trait. See e.g., Haccou et al. (2005) for a general account of branching processes with emphasis on variation, growth and extinction of populations.
In reality, species numbers derived from fossil data or estimated from analysis of phylogenetic trees often tend to vary over long periods of time in a mode of stationarity or quasi-stationarity within a finite range of realistic values. It is, thus, rather natural to seek to implement in the modeling set-up, some form of population size control or diversity-dependent regulation. While logistic-type population-size dependence is successfully implemented in branching process theory and easily interpretable in term of population dynamics, the nature and causes of diversity-dependent diversification are still debated, cf. Lambert (2006), Rabosky (2013). The diversity-independent models studied in e.g., Maddison et al. (2007) and Tahir et al. (2019), assume that the perlineage rates of speciation, extinction, and transition of traits are trait-dependent but constant in the sense that they do not depend on the current number of species of a particular trait, or the current total number of species. On the other hand, systems involving diversity-dependence retard supercritical growth by regulating the increase in species numbers through birth and death rates in various ways (Quental and Marshall 2010). In the papers of Parsons and Quince (2007a, b), that consider population-level dynamics, the birth rates are assumed to be decreasing functions of the total number of individuals in the population, thus, imposing a maximal carrying capacity of the system in case the birth rates vanish at some level. In logistic branching models, additional deaths are imposed in proportion to the square of the number of individuals to slow down population growth, Lambert (2005). Similarly, Parsons et al. (2008Parsons et al. ( , 2010 apply population size dependent mortality rates. Fournier and Méléard (2004) consider an ecological system where individuals are characterized by their location, and the mortality rate depends on the local population density. In an environment of say two competing traits, both speciation and extinction rates could be diversity-dependent, and the rates could be allowed to have different sensitivity to competition (Mallet 2012). Further variations have been proposed, such as using time varying, instead of fixed, carrying capacities (Marshall and Quental 2016), and including adaptive radiations by decoupling of the diversity-dependent dynamics of a sub-clade . Also proposed are models in which positive net diversification rates are followed by negative net diversification rates to produce 'waxing and waning' diversity dynamics (Morlon 2014), and models where the speciation rates are decreasing functions of time, such that the magnitude of the decline in speciation increases as time approaches the environment's carrying capacity (Rabosky and Lovette 2008). Typical examples are discussed in e.g., Rabosky (2013) and , which summarize the parametrizations and properties of models involving density-dependence. In Rabosky (2013), Darwinian diversity-dependence and asymptotic diversity-dependence are contrasted. The former entails that a slowdown in speciation rates with diversity is in agreement with inter-species competition and Darwin's principle of divergence. The latter represents patterns of species saturation and long term stability of diversity trajectories, but does not specify clear mechanisms of cause and effect. In general, the time scales rele-vant for diversity-dependence or diversity-independence are unspecified, thus leaving open whether regulatory effects act on microevolutionary rates over shorter ecological time scales, or on macroevolutionary rates averaged over much longer geological time spans (cf. Rabosky (2013). See also, Benton and Emerson (2007)).
We present here, a framework to address diversity-independent versus diversitydependent dynamics for a family of species with a binary trait, for which the rates of creation and extinction of species as well as the rates of transition of the trait between species, are trait-dependent. Formal equivalence can be made between species level and population level dynamics, where number of species is equivalent to number of individuals, diversification rate is equivalent to growth rate, and transition in species traits is equivalent to mutation (e.g. Chevin (2016), Vellend (2010)). So, useful insights can be gained from using concepts of population genetics to species diversification analysis. We first rigorously investigate the mechanisms in species tree models which are analogous to those of population genetics models, such as mutation, selection, and genetic drift in allelic frequency models, and later, we apply the population genetics concepts to study diversity-dependent processes. The fraction of species in the family which carries one of the traits evolves in a manner directly comparable with the evolution of allele frequencies in the population genetics framework. The transition of a species of one trait to a species of the other trait resembles mutational change from one allelic type to another. Similarly, the trait-dependence present in creation rates, extinction rates, and the associated turnover rates, cause selection effects as well as frequency dependent genetic drift coefficients. Furthermore, various forms of population size dependence may be cast as density-dependent selection mechanisms. Despite, this formal equivalence, there are relevant biological differences between species level and population level models. For instance, whereas mutation is often much lower than selection in population genetics, there is no a priori reason to assume that rates of trait transition are of different magnitude than speciation and extinction rates. This makes species traits less "heritable" than individual traits. Besides, the number of species relevant to the species diversification dynamics is likely far less than the effective size of a species (100, 1000 for the former versus usually much more than 10 4 for the latter). This makes diversification dynamics more sensitive to stochastic effects (see below).
The central technique in our approach is a diffusion approximation of a species tree Markov chain, which runs on a suitable time scale of evolutionary time units. In particular, our scaling method incorporates the point of view that "macroevolutionary speciation rates can involve processes associated with both the splitting and extinction of populations over ecological and demographic timescales", as discussed in Rabosky (2013). From the complementary viewpoint of diffusion processes where the dynamics is provided by solutions of a PDE, our contribution relates closely to the pioneering work by Gillespie (1974). Using these links we study the effective population size in the model, and shed some light on an apparent paradoxical conclusion reported in Gillespie (1974). Several simplified, partly symmetric, parameter settings are identified in the stochastic model leading up to a reference case, or neutral case, which corresponds to fully neutral evolution in population genetics modeling.
Research has previously been carried out on similar topics, in which diffusion approximation methods were used to study density-dependent processes. Lambert (2006) provides a branching process approach to studying probability of fixation for two-allele population models, including density-dependence mechanisms. Parsons and Quince (2007a) extended a logistic growth model to a supercritical model of two competing types, and using diffusion approximation methods, estimated the probability of fixation for both types. They proposed a so called 'non neutral' model, in which the ratios of birth to death rates were assumed to be different for the two types, and showed that the type with higher birth to death rate ratio eventually takes over the entire population. In a subsequent analysis by Parsons and Quince (2007b), the fixation probability was again approximated using diffusion processes, but for 'quasi neutral' populations in which both types had equal birth to death ratios. In this case, the type with the higher birth rate shows an increase in numbers during the growth phase at densities below the carrying capacity, whereas near the carrying capacity, the type with lower birth rate is favored, due to smaller fluctuations in population density -a phenomenon termed as 'r vs K selection' (see e.g., Pianka (1970)). Abu Awad and Coron (2018) use a scaling approach similar to ours to study effective population mass, absorption and extinction times, etc., for a population controlled by a carrying capacity. Chevin (2016) utilized diffusion approximation methods to analyze the evolution of binary discrete and continuous traits, and interpreted diversification models in terms of population genetics concepts of species selection and random drift. In our current work, we develop a further direction by providing a thorough mathematical basis and an analytical framework for the analogy between binary trait species diversification and population genetics models. For completeness and clarity, we re-derived some of these previous results using our proposed analytical framework.

The species branching model
We consider the dynamics of the size of a species family which carries a binary trait, marked 0 or 1, and undergoes trait-dependent splitting, extinction, and transition. We apply a two-type, continuous time Markov process X = (X u ) u≥0 with components X = (K , L), such that X u = (K u , L u ) and K u and L u represent the number of species with trait 0 and trait 1, respectively, at time u. The two-type branching events are target state branching rate in terms of splitting rates λ 0 , λ 1 , extinction rates μ 0 , μ 1 , and transition rates δ 01 and δ 10 . For the analysis in this work, however, it will be convenient to use the equivalent set of parameters consisting of net diversification rates d i and turnover rates τ i , i = 0, 1, given by in addition to the transition rates δ 01 and δ 10 . Important aspects of the stochastic behavior of X u are determined by the eigenvalues, γ − and γ + , γ − ≤ γ + , of the 2 × 2 mean matrix For the case γ + ≤ 0 of subcritical (<) and critical (=) branching, the species family will go extinct at a random time η 0 , the extinction time, which is finite with probability one and such that X u = (0, 0) for all u ≥ η 0 . Under the supercritical assumption γ + > 1, with a positive probability strictly less than one the process goes extinct, otherwise the species family survives and grows in size without bound. The species model is a two-type branching process X = (X u ) u≥0 with linear jump rates, such that the dynamics of X = (K , L) (here viewed as column vector) satisfy where X 0 is the initial state of the model (e.g., X 0 = (1, 0) ) and (M u ) u≥0 is a stochastic (martingale) term with mean E(M u ) = 0. A general class of two-type branching processes, including population size dependent models, is characterized in Kaj and Tahir (2019) as unique solutions of such stochastic equations. We are particularly interested in the representation (P, R) of X , given by

Law of large numbers
To understand the average behavior of (K , L) and of (P, R), it is useful to consider a system which starts at time u = 0 with m species, which is equivalent to summing m i.i.d. copies of the original model starting with one initial species each. Write X (m) = (K (m) , L (m) ) for such a system, with e.g. X (m) 0 = (m, 0) , and let X m = 1 m X (m) be the resulting average. The (strong) law of large numbers applies, so for each u ≥ 0, where the limit is the expected value x u = E(X u ), which solves the linear ODE It follows from the law of large numbers for X m = ( K m , L m ), that the process (P m , R m ), defined by The limit equation for the fraction p, is a first indication of a connection to population genetics modeling, as it resembles the deterministic part of allele frequency dynamics (with mutation intensities δ 01 , δ 10 and selection intensity d 0 −d 1 ). In this model the net transition rate of trait 0, given by the term −δ 01 p u + δ 10 (1 − p u ), arises from so called anagenetic transitions between the two types. Anagenetic transitions occur along the branches of the species tree separate from speciation events. In contrast, a cladogenetic transition is a change in trait combined with the survival of the original type, hence associated with two-type speciation. To account for cladogenetic transitions occurring with probabilities a 0 and a 1 respectively for the two types, we modify the speciation and transition rates in (1) as while the extinction rates remain the same . Then the stated ODE for ( p, r ) holds with d 0 and d 1 replaced byd 0 = d 0 + a 0 δ 01 andd 1 = d 1 + a 1 δ 10 . Hence the relative impact of cladogenetic transitions as measured by a 0 and a 1 affects the "selection intensity"d 0 −d 1 . This type of transition is analogous to a 'reduction in species level heritability of the trait' (Chevin 2016).

Evolutionary time scaling
In addition to the short time scale of "species generations", we introduce an evolutionary time scale and apply diffusion approximation methods to help understand the dynamics of the species model over the long time scale. Indeed, some of the most powerful tools of mathematical population genetics rely on approximation with diffusion processes, which allow for efficient computation of quantities such as fixation probabilities, expected times to fixation, and expected frequency spectra. The starting point for this approach is to relate the Markov chain X , with the population genetics pre-limit process, such as the Wright-Fisher model or Moran model, using the time scale of generations.

Scaled Wright-Fisher and Moran models
To see the relevance of time scales, we recall the haploid version of the standard biallelic Wright-Fisher model in discrete time k ≥ 0, with fixed population size N , selection coefficient s representing reproductive weights 1 + s for trait 0 and 1 for trait 1, and probabilities p 01 and p 10 for mutations from allele 0 to 1 and 1 to 0, respectively. Letting Z 0 be the initial number of trait-0 alleles and Z k , k ≥ 1, the number of 0-alleles after selective sampling of k new generations allowing for mutational change of traits in each step, this is the Markov chain Z = (Z k ) k≥0 . The standard diffusion approximation of the Wright-Fisher model involves a re-scaling of both time and population size with N , as well as a scaling of the model parameters. For this aim, let γ , ρ 01 , and ρ 10 be the scaling parameters which control the rate at which the strength of selection and mutation tends to 0 with increasing N , as More generally, these relations can be understood as limit relations, for example s N ∼ γ /N , i.e., lim N →∞ N s N = γ . We write Z (N ) , for the associated scaled Wright-Fisher model, and consider the frequency process ξ N = (ξ N t ) t≥0 on the evolutionary time scale of N t generations, defined as The Wright-Fisher diffusion, ξ = (ξ t ) t≥0 , is the limit process of ξ N as N → ∞, and is known to be the unique, strong solution of the stochastic differential equation with B a Brownian motion. Written in the form we may refer to b as the infinitesimal mean or the diffusion drift function and to σ 2 as the infinitesimal variance, the diffusion variance, or, the genetic drift function. The Moran model applies a birth-death mechanism for reproduction of alleles, rather than the characteristic binomial sampling of the Wright-Fisher model. Since birth events, where alleles copy and spread, are always compensated for by deaths of randomly sampled individuals, the population size is still maintained at a constant level N . The corresponding rescaling and diffusion approximation as N → ∞, yield the same limit process except for a factor 2 in the variance function, σ 2 (y) = 2y(1 − y). The further parameterized case, σ 2 = y(1 − y)/N eff , can be associated with the Wright-Fisher approximation of a system modulated by an effective population size, N eff . For the general theory we refer to Etheridge (2011).

Gillespie approach
The infinitesimal generator of the Wright-Fisher diffusion process is the differential operator G, defined for a class of sufficiently regular functions f on the unit interval by The population genetics model of Gillespie (1974) allows the population size to vary following an offspring distribution with mean 1 + μ i and variance σ 2 i , for two alleles A i , i = 1, 2. Writing p for the frequency of allele A 1 and r for the total population size, this approach suggests the two-dimensional Markov generator Here, we are using r for population size as opposed to the more classical population genetics notation n as in Gillespie (1974), to avoid mix-up with our general scaling parameter n used below. The associated Kolmogorov's backward equation is dual to the corresponding forward equation given as Equation 3 in Gillespie (1974). In particular, we read out that r /( pσ 2 2 + (1 − p)σ 2 1 ) acts as a varying effective population size. It is perhaps worth stressing that this measure of effective population size varies not only with the size r but also, in general, with the frequency p, due to possibly different variance parameters σ 2 1 , σ 2 2 . Earlier references on random effective population size and stochastic population size models can be found in Kaj and Krone (2003), Sjödin et al. (2005), together with developments on coalescent models for such situations. An additional result in Gillespie (1974), in particular Equations 6,7, refers to the special case μ 1 = μ 2 saying that the probability of fixation of allele A 1 , given an initial frequency p, equals The diffusion approximation model of species richness we develop next will be closely related to the model described by Eq. (4). We will recover and discuss further the role of both the random effective population size r /( pσ 2 2 + (1 − p)σ 2 1 ) and the fixation probability u( p).

Scaled species model
Our investigation takes the view that, given non-extinction of the continuous time branching process X , the component K = (K u ) u≥0 in the species model relates naturally to the discrete time Markov chain (Z k ) k≥0 , of the population genetics model. Of course these quantities are not immediately comparable since the total number of species R varies randomly, while the population size N is fixed from one generation to the next. For this reason, we introduce a species family process X (n) = (K (n) , L (n) ) marked by a separate scaling variable n. The two-type branching process is driven by an initial number of species of magnitude n, and has scaled parameters d 1 , δ (n) 01 and δ (n) 10 , which control the rates of diversification, turnover, and transition on the population timescale. For fixed t, the fraction n −1 X (n) [nt] , is the typical configuration of trait frequencies in the species family seen over the span of t time units that elapse upon completing [nt] generations. The precise behavior of the rates as functions of n is not crucial as long as they scale for large n with the following macroevolutionary rates which act on the new time scale, Here, τ i measures variance in the production of daughter species and is the reference turnover parameter for extinction and/or speciation events of trait i, β i is the macroevolutionary net diversification rate of trait i, seen as the long-time average net effect of speciation and extinction, and ρ 01 and ρ 10 are the resulting macroevolutionary rates of exchange of traits. For later reference we also adopt from Gillespie (1977) the new evolutionary principle that trait success (fitness in population genetics terminology) not only increases with offspring mean but also may decrease with offspring variance inversely proportional to population size. Using macroevolution terminology, τ refers to the species turn-over rate, a quantity which has received increasing attention in the empirical literature aiming at interpreting patterns of species diversity (Gamisch and Comes 2019;Han et al. 2020;Nakov et al. 2019). In the current situation of developmental stochastic dynamics, as opposed to environmental stochastic influences, β i (or rather 1 + β i ) represent offspring mean and τ i offspring variance. Then, following Gillespie (1974Gillespie ( , 1977 the fitness is properly measured by Conditionally on the set of paths of X which do not go extinct, let The ratio P n will play the role of the frequency process ξ N , even though the scaled Wright Fisher process ξ N is derived from a population of fixed size N whereas P n is a frequency with respect to the stochastically varying species richness process R n . The final step towards a generic species trait richness model is finding the limit processes in the scaling regime as n → ∞, expecting the remaining macroevolutionary rates to be τ 0 , τ 1 as rates of change for the speciation/extinction process, β 0 , β 1 diversification rates for net growth (if positive) or net decline (if negative), and ρ 01 , ρ 10 controlling trait interchange. We expect in the limit to obtain a bi-variate, coupled diffusion process, X = (X t ) t≥0 , with representations (K, L) and (P, R) say. To assist in identifying the limit model we will exploit the existing theory of measurebranching processes, briefly laid out in Sect. 5 below. In fact, it is straightforward to re-interpret X (n) as a measure-valued branching process on trait space E = {0, 1} with scaled parameters. In this view our macroevolutionary scaling is consistent with a super-process approximation, for which the spatial motion is the on-off process on E with jump rates ρ 01 , ρ 10 . As a consequence, (n −1 X (n) [nt] ) t≥0 , has a weak limit which is a super-on/off process with binary branching.
where X 0 is a nonnegative deterministic column vector. As n → ∞ the scaled branching process (n −1 X (n) nt ) t≥0 , converges weakly to the continuous state branching process X = (K, L) , which is the unique solution of the SDE where B 0 , B 1 are independent, standard Brownian motions. Furthermore, (P n , R n ) converges weakly to (P, R), the unique solution of where f 0 (r ) and f 1 (r ) are the fitness measures in Eq. (7) and B − t , B + t are standard Brownian motions. The Brownian motions are independent when τ 0 = τ 1 . In general, the quadratic covariation processes of (B − , B + ) and (P, R) are given by

Effective population size: resolving Gillespie's paradox
The infinitesimal mean and infinitesimal variance terms of the diffusions P and R, except for the mutation terms involving ρ 01 and ρ 10 , as well as the covariation in Eq.
(10), have matching terms in the generator G in Eq. (4). In Eq. (8) we recognize a Wright-Fisher diffusion with selection and mutation, and with trait-dependent genetic drift (or 'species drift' acts as a stochastically varying effective population size for the differential dynamics of dP t . This observation is parallel to that in the original work Gillespie (1974), where the corresponding quantity appears in the coefficient of ∂ 2 f /∂ p 2 in Eq. (4). The species drift increases as the number of species decreases, analogous to genetic drift of the allele frequency in population genetics. Similarly, the species drift increases with the turn over rates τ 0 and τ 1 , which represent the species offspring variance. By applying Ito's formula to N t = g(P t , R t ), t ≥ 0, with g( p, r ) = r /(τ 0 (1− p)+τ 1 p), we obtain directly the dynamics of (P, N ) as an alternative to that of (P, R). For simplicity, we consider the case where the diversification rates are equal, β 0 = β 1 = β, and the rates of trait exchange vanish, ρ 01 = ρ 10 = 0. Then The benefit of this additional representation of P in comparison with the previous stochastic equation (8), is that the effective population size N in the species drift of P now appears in classical form. The disadvantage is the increased complexity of the diffusion drift for P as well as diffusion drift and variance of N , as compared to those of R in (9). The discussion in Gillespie (1974) also points to a potential drawback of the appearance of N in the present form in (11). If trait 1 is rare or near elimination from the species family, so that P ≈ 1, then formally N ≈ R/τ 1 which suggests that the nowabsent trait 1 controls the effective size through the parameter τ 1 . Quoting Gillespie (1974): This rather uncomfortable conclusion suggests that the concept of effective population size loses a good deal of its value in the context of the present model. Using the wider frame of linked diffusion processes, however, this apparent paradoxical conclusion may be avoided. The above relation for dN t suggests that if trait 1 would be eliminated and hence P reach fixation at 1, then the resulting effective size (N 0 t ) t≥0 , would satisfy and hence depend on the relative turnover ratio τ 0 /τ 1 and β, rather than just τ 1 . Of course, for the same parameter settings, if we let P tend to 1 in (9), the resulting richness R 0 t of trait 0 satisfies consistent with the relation N 0 = R 0 /τ 1 , which appears to have been the origin of the "uncomfortable conclusion" discussed in Gillespie (1974). The diffusions N 0 and R 0 , are Feller processes with linear drift belonging to the larger class of continuous state branching processes. It is a classical result in branching process theory that, if R 0 0 = r , and accordingly N 0 0 = r /τ 1 , then the extinction probabilities are given by Naturally, they no more depend on the parameter τ 1 . In response to a question raised by an anonymous referee, we also mention an alternative approach to this topic. By changing variables from ( p, r ) to ( p, n) with n = r /(σ 2 1 (1 − p) + σ 2 2 p) one obtains a new generator G f ( p, n) parallel to G f ( p, r ) in (4). The coefficients in G f ( p, n) will now have matching terms to those in the SDE for (P t , N t ).
In other words, species-level selection is reduced in small populations and enhanced in large. If τ 1 > τ 0 , the opposite holds. Hence, traits with higher diversification rates and slower turnover rates are favored by species selection. The time average of γ up to time t is can be seen as the relative contribution of τ 0 − τ 1 to species selection (Chevin 2016). The total species richness is measured by R in Eq. (9), which is a continuous state branching process with linear drift and trait-dependence in both the diffusion drift and variance functions. Conditional on non-extinction, R will grow exponentially as t → ∞. Hence, the diffusion variance term for dP t in (8), driven by B − , vanishes. The diffusion drift function for dP t also simplifies and we expect P t → p ∞ as t → ∞, where p ∞ is the solution of the second order equation The covariation of P and R in Eq. (10) measures the degree to which these quantities vary simultaneously, in the sense that the expected change over a small time interval [t, t + h), conditional on the present state (P t , R t ), is given by We observe that, while the covariation of P and R is proportional to the difference in turnover rates, τ 0 − τ 1 , the analogous covariation of the two underlying Brownian motions, B − driving P and B + driving R, is a function only of the turnover-ratio τ 0 /τ 1 . In both cases, the dependence is positive in case τ 0 > τ 1 and negative otherwise. We conclude that the species richness co-varies positively with the trait with the highest turnover. The turnover ratio τ 0 /τ 1 determines the strength of the covariation which is intrinsic at the level of Brownian noise, whereas both of τ 0 and τ 1 are required in order to find the covariation of the state variables P and R.

Special cases (a) Symmetric net diversification rates,ˇ0 =ˇ1
Let β 0 = β 1 = β. Then, from Eq. (8) and Eq. (9), we have By Eq. (10) the covariance structure remains the same as in the general model. Here, R is a supercritical Feller-type diffusion process with diffusion variance function modulated by the trait proportions P and 1 − P. The SDE for P can be viewed as a Wright-Fisher diffusion with population size dependent infinitesimal drift and variance functions. The selection coefficient −(τ 0 − τ 1 )/R shows that the species associated with the trait of the smallest of the two turnover rates, i.e. trait 0 if τ 0 < τ 1 and trait 1 if τ 0 > τ 1 , are selected for instead of species with the higher turnover. Biologically, this means that, all other things being equal, more long-lived species are selected for, as increasing the species generation time will automatically reduce τ (Lin et al. 2012).
(3), except that now the genetic drift term is inversely proportional to R, which acts as a randomly varying effective population size. At the same time, the population richness process R is a Feller-type diffusion process with diffusion drift function modulated by the trait proportions P and 1 − P.
(c) Neutral evolution: symmetric net diversification rates,ˇ0 =ˇ1 and turn over rates, 0 = 1 Let τ 0 = τ 1 = τ and β 0 = β 1 = β be nonnegative parameters. By combining the cases a) and b), we obtain Here, the first equation of P is comparable to the Wright-Fisher diffusion process Eq.
(3) with only mutation and no selection, hence the term 'neutral evolution'. The total size process is a supercritical Feller diffusion process, which we encountered previously in (12) as a limiting case under trait fixation. In the present context, we use once more that for a given initial size R 0 = r , the extinction probability that R t → 0 as t → ∞, equals e −2βr /τ . From this we retrieve the known result in stochastic demography that higher turn-over rate increases the extinction risk. On the complementary set of nonextinction, the total size process tends to infinity almost surely, R t → ∞ a.s.

(d) Quasi-neutral rates
Quasi-neutrality, as proposed and discussed in Parsons and Quince (2007b), Parsons et al. (2008), is a rate symmetry condition, which in our notation would mean According to Parsons et al. (2008), the ratio 2β i /τ i is an alternative measure of relative success or relative fitness of trait i. In this framework, therefore, quasi-neutral traits are equally fit. Eq. (8) and Eq. (9) become

Application of the population genetics approach to diversity-dependent models
We discuss the effect of diversity dependence on the species richness model. The interaction is imposed by letting the instantaneous rates of speciation and extinction in (6) depend on the scaled total size R n . A variety of interaction schemes can be incorporated into the measure-branching model formalism, which we use to derive Proposition 1. For an introduction to some of the principles of interaction in measurebranching and superprocesses, see Etheridge (2000). The relevant approach for us is the use of an analogue of the Girsanov theorem that allows for the introduction of "non-linear branching", and in the end logistic versions of the superprocess limits obtained by an absolutely continuous change-of-drift. Our analysis is therefore similar in many ways to the development of the spatial model in Fournier and Méléard (2004), considering our trait space as the mark of location. As in Fournier and Méléard (2004) we apply superprocess techniques but at this stage our interaction schemes are less general. It is also known that logistic Feller diffusion processes arise under much more general branching mechanisms than ours (Lambert 2005), indicating more general results. Suppose the current state of the species model at time t is (P n t , R n t ). A constant c will have the role of a 'carrying capacity'. We consider in Sect. 4.1 the diversity-dependent species models which have jump rates at t, given by -In greater generality, the carrying capacity is determined by a pair of parameters, c 0 , c 1 , and the strength in reduction of the net growth is trait-dependent. In Sect. 4.2 we consider the corresponding model with jump rates Further generalization can also be made by assuming that the two kinds of species both affect and are affected differently by species richness: This parallels Lotka-Volterra competition models but this latter case will not be developed further here and is left for future work. We have applied classical logistic type diversity-dependence acting on the macroevolutionary net growth rate nd (n) i . Starting from R n 0 < c, the net effect of speciations and extinctions is reduced with increasing species richness and turns into a subcritical regime above the level c, in case R n t > c. To simplify notation in the rest of this section we introduce the function

Diversity-dependent macroevolutionary diversification rates
The limit process as n → ∞ is obtained as in Proposition 1, with β 0 and β 1 at time t replaced by the species richness dependent functions β 0 (1 − R t /c) and β 1 (1 − R t /c).
The system in Eqs. (8)-(9) becomes with trait fitness now of the form In particular, if we impose the additional assumptions of neutral rates discussed as special case c) in Sect. 3.6, the process R is the solution of This is the logistic Feller diffusion process, which has non-negative paths and goes extinct with probability one. The logistic Feller diffusion is known to have a quasistationary distribution in the sense of a Yaglom limit, namely that the law of R t conditioned on {R t > 0} converges as t → ∞ to a probability measure (Cattiaux et al. 2009). Such a Yaglom limit, however, is not known in explicit terms and need not even be unique. For our model, we note that Cattiaux et al. (2009) Thm 8.2, yields uniqueness of the Yaglom limit for the case c < 1, whereas under biologically relevant conditions we expect c > 1. For the general, non-neutral, case, letting R = R/c be the total species richness in relation to the carrying capacity c, the SDE system for the state variables is The two SDEs with respect to B − and B + have second order moments proportional to 1/c. This can be used to show formally that the diffusion terms vanish in the subsequent limit of large carrying capacity, c → ∞, revealing the deterministic limit equation that is, the ODE system With initial value ( p 0 ,r 0 ) ∈ (0, 1) 2 , the equilibrium solution as t tends to infinity, is p ∞ = ρ 10 /(ρ 10 + ρ 01 ),r ∞ = 1. We see that species selection, acting on the net diversification rates β 0 and β 1 , is effective only during the growth phase but vanishes as the number of species approaches the carrying capacity, i.e., whenr t approaches 1. In ecology, this is said to be a model with only r -selected traits but no K -selected traits (where r is the growth rate and K is the carrying capacity), that is, the diversification of the two traits is regulated in the same way asr t approaches 1 (Pianka 1970). This scenario would correspond to a trait allowing adaptive radiation with a rapid diversification of species possessing this trait when many niches are still available, until an almost neutral dynamics when carrying capacity is reached.

Trait-and diversity-dependent macroevolutionary diversification rates
The density dependent mechanism is again to reduce the growth of the species family with increasing diversity, but now the efficiency of the nonlinear influence is tuned for each trait, with the resulting trait fitness By adapting the derivation leading to Eq. (8)-(9) to this case and putting R = R/c as before, then in the limit n → ∞, A special case of interest is obtained by taking β 0 /c 0 = β 1 /c 1 = α/c, τ 0 = τ 1 = τ , using a new parameter α. For simplicity, let us also take ρ 01 = ρ 10 = 0. Then In this example we notice that the growth of species richness is diversity-and densitydependent while the selection term is now density-independent. Let us now come back to the general model studied in this section and suppose that as c → ∞ then both of c 0 and c 1 also tend to infinity, such that c/c i → α i , i = 0, 1, for some positive constants α 0 and α 1 . Then formally The case β 0 = β 1 = β > 0 gives the system of ODEs r t = βr t 1 −r t (α 0 p t + α 1 (1 − p t )) .
The equation for p above shows that selection, positive or negative, strengthens with increasing species richness. In ecology, this would be equivalent to K -selection, whereby the trait less sensitive to competition is favored. For example, if c 0 > c 1 so that trait 0 is less constrained than trait 1 then the selective rate of growth of trait 0 is positive, β(α 1 −α 0 ) > 0. For both "r-like" and "K-like" cases, selection is diversitydependent. Returning to (16) and the special case α 0 β 0 = α 1 β 1 = α, although total species richness is regulated, species selection becomes diversity-independent and the selection coefficient reduces to β 0 − β 1 . This would correspond to a scenario where the total diversity is regulated from mechanisms unrelated to the focal traits. In macroevolution, several traits have been supposed to be evolutionary dead-ends, such as asexuality (Maynard-Smith 1978) or self-fertilization (Igic and Busch 2013). The classical view is that negative diversification is associated with such traits but they are continuously reintroduced through asymmetrical (or even unidirectional) shifts from traits associated with positive diversification (Goldberg et al. 2010). The above formulation suggests a more elaborate scenario whereby the dead-end trait has not necessarily a basal negative diversification rate but is more sensitive to diversity dependence. Such traits could lead to initial diversification in initially species-poor environments (i.e. much lower than the species carrying capacity) but diversification would then decrease and eventually be negative through time, as the number of species would increase. This is in agreement with recent empirical observations in the plant genus Capsella that selfing species are more sensitive to competition than outcrossing ones (Petrone Mendoza et al. 2018;Yang et al. 2018), and a plausible explanation for the hypothesis that selfing and asexual lineages "senesce" in diversification rates (Ho and Agrawal 2017).

Fixation and extinction of traits
To study trait fixation probabilities using a similar approach as in Parsons et al. (2008) we let ρ 01 = ρ 10 = 0 in Eq. (15), which means that species are unable to change traits. The initial composition of traits in the scaled species family at time 0 is a fraction x of trait 0 and the remaining fraction 1 − x consisting of trait 1 species, 0 < x < 1. As in Parsons et al. (2008), we strive to analyze the fate of the trait 0 element over time as a function of x. For this aim we must restrict further our considerations to the density dependent case with symmetric net growth rates, β 0 = β 1 = β, and a single carrying capacity c, the same for both traits. Then which is a logistic version of the model listed as special case (a) in Sect. 3.6. The crucial aspect of the resulting SDE for P, which is required for our method of proof to work, is that all dependence on R is mediated through a single function g(R), in this case g(r ) = 1/r , which appears as a multiple in both the diffusion drift term and the diffusion variance term. This is the reason why the closely related special case β 0 = β 1 , c 0 = c 1 is not covered by our method, nor is β 0 = β 1 , c 0 = c 1 or τ 0 = τ 1 , β 0 = β 1 .
In the present situation, however, general properties of logistic branching processes show that the species richness R goes extinct with probability one as t → ∞. We may then consider the sequence of stopped processes (P t∧δ n , R t∧δ n ) t≥0 , where δ n = inf{t > 0 : R t = 1/n}, and have a well-defined solution of (17) with positive richness component on {0 ≤ t ≤ δ n } for each fixed n. Yet, the mixture of traits as captured by (P t∧δ n ) t≥0 along the path to extinction of the species family, as n → ∞, would perhaps be of limited interest, indicating that the fixation probability concept might have limited relevance. On the other hand, the quasi-stationary behavior of R mentioned previously means that the time to extinction can be very long. This allows us to circumvent species extinction by modeling the species family rather using a process R + , say, which is R conditioned on the event {R t > 0, t > 0} of ultimate nonextinction. In the following we study trait absorption with respect to such nonextinct paths. Let (P + , R + ) denote the solution of (17) conditioned on the event of ultimate nonextinction. The boundaries 0 and 1 are both absorbing for P + . Let us denote by η 0 the fixation time of trait 1, that is, the random time at which P + first hits the lower boundary 0, if ever. Similarly, η 1 is the fixation time of trait 0, the time at which the upper boundary point 1 is first hit. If η 0 is finite, then all species are trait 1 from that time and onward, meaning that the upper limit is never reached, η 1 = ∞. Similarly, if η 1 < ∞ all species end up as trait 0, and η 0 = ∞. The absorption time η = min(η 0 , η 1 ) is the time of extinction or fixation, whichever occurs first. Next we recover the fixation probability (5) within our setting, and derive a result on the "species frequency spectrum".

Theorem 1
The fixation probability q(x) = P x (η 1 < ∞) of P + rendering all species trait 0 eventually, given initial frequency x of trait 0, equals Moreover, for bounded functions f defined on the unit interval [0, 1], Proof Let X = (X t ) t≥0 be a diffusion process, which solves The SDE for X is the same as that of P + under the enforced condition R + ≡ 1. It turns out that the fixation probabilities of P + and X respectively, coincide. The reason for this is that the distribution of X can be extracted from that of P + , via a random time change. Since R + t > 0 for all t, the transform of R + , given by the function is a strictly increasing random time change with time change rate 1/R + t , such that its left-inverse is continuous, and A B t = t for all t ≥ 0. Thus, The desired function q(x) is now obtained as the fixation probability of the timechanged process (P + A t ) t≥0 , which turns out to be closely connected to X . Indeed, it is a consequence of the time change result Theorem 8.5.1 in Øksendal (2007), that, for each t ≥ 0, P + A t and X t have the same distribution. Let η X 1 denote the fixation time of X . Since it follows that η X 1 and B η + 1 have the same distribution, and so q(x) = P(η X 1 < ∞). Similarly, η X and B η + have the same distribution.
The SDE for X in Eq. (18) is (14), and where X 0 = x, 0 < x < 1, and the boundary points {0, 1} are absorbing. The scale function S(x) and speed function m(x) associated with (X t ) t≥0 , X 0 = x, are defined by Here, .
By using Feller's boundary classification and the theory of one-dimensional diffusion processes (Karlin and Taylor 1981;Etheridge 2011), it follows that the boundary points 0 and 1 are both accessible from the interior state space as exit boundaries. The extinction probability is determined by the scale function (using the normalization S(0) = 0), as Moreover, for bounded functions f , It is straightforward to check that the right hand side equals To complete the proof, we first use once again that P + A t and X t have the same distribution for each fixed t, and then make the change-of-variables s = B r , to obtain

Application of Theorem 1: fixation and extinction of a rare trait
Suppose we have a family in which all species carry a single trait, and we introduce a species with a second trait of the same net growth rate into the population. What are the chances of the new trait getting fixed or lost? What are the implications if the new trait causes a shift in turnover rate? For example, all else being equal, this would correspond to a shift in life span, lifespan being inversely proportional with respect to turnover rate. As another example, recently, polyploidy (the doubling of the genome) has been shown to have no effect on diversification (β 0 = β 1 ) but to increase turnover rate (τ 0 > τ 1 ) (Zenil-Ferguson et al. 2019).
To address these questions we observe that if a small fraction x of trait-0 species are inserted into a population otherwise consisting entirely of trait-1 species, then as x → 0, In addition, by an application of the second statement in Theorem 1 with f (y) = 1, y ∈ [0, 1], the expected absorption time weighted by the inverse species richness process, satisfies as x → 0. Keeping the turnover rate τ 1 of the original trait fixed, we see that the probability of fixation of the new trait is inversely proportional to the turnover rate τ 0 of the new trait. Thus, traits which cause a burst in both speciation and extinction are suppressed whereas traits with low turnover rates are favored, with respect to possible fixation. Similarly, the expected weighted absorption time decreases with increasing turnover rate τ 0 , consistent with shorter time to extinction. An alternative interpretation of the result in Theorem 1 suggests a notion of trait frequency spectrum in analogy to the allele frequency spectrum in population genetics. For this we assume, with reference to the space and time scaling parameter n, that in each of the n time steps forming one evolutionary time unit, trait injections occur at rate θ > 0. When one of these events happens, a fraction 1/n of the family gets a new trait of turnover rate τ 0 . The remaining fraction has turnover rate τ 1 . We think of the successive injection events representing each time a new trait unrelated to previous ones. Each new trait traces out its own path P + but all relate to the same R + . Most of the new traits quickly go extinct. A few might survive for a while and even get fixed, eventually. The scaled trait 0 fixation probability per time unit is the large n limit A possible interpretation is that with larger turnover rate τ 0 of an "invading" trait, the smaller is the fixation rate τ 1 /τ 0 , and hence, the more efficient is the existing species family in purging such an intruder. For non-negative bounded functions f on the interval [0, 1] with f (0) = 0, which satisfies the integrability condition 1 0 f (y)y −1 dy < ∞, we find In particular, The weight function 2θτ −1 0 /y which arises in the above scheme of limiting expected values, plays a similar role as the stationary allele frequency spectrum in population genetics. While such a frequency weight function is not integrable over y ∈ (0, 1), and hence does not allow a probability density interpretation, it does define an intensity measure for a well-defined Poisson random measure on (0, 1). For the Poisson random field approach in population genetics, see Sawyer and Hartl (1992) and e.g. Kaj and Mugal (2016), Section 2.3. Intuitively, for each y ∈ (0, 1), (2θ/τ 0 )y −1 is in this sense the stationary intensity that (P t ) t≥0 occupies the frequency band near y, as measured by the size of (P + t /R + t ) t≥0 within the time frame during which the trait remains in the family.
The case τ 0 = τ 1 is the fully neutral case with fixation probability q(x) = x and trait spectrum intensity 2θ/y. The case τ 0 = τ 1 represents a form of trait selection, which only affects the relative magnitude of trait frequencies present in the species family, but not the shape of the trait frequency spectrum.

The super-process representation of the limit process
In this final section, we return to the two-type branching process X with rates (1) running on the scale of species generations. The aim is to provide a sketch of the proof of the limit result Proposition 1, shifting to the evolutionary time scale.
As an alternative and from a wider perspective, we may consider X as a measurebranching process with spatial motion on the trait space E = {0, 1} and spatially dependent binary branching. Each lineage in the branching tree changes its current species trait according to independent copies of the space motion J = (J u ), which is the two-state Markov jump process with jump rates δ 01 and δ 10 for transitions from 0 to 1 and vice versa. In addition to the on/off motion caused by J , the number of species of each trait develop independently as linear binary branching processes with generating functions F(z; j) = λ j z 2 + μ j λ j + μ j = d j + τ j 2τ j z + τ j − d j 2τ j , j = 0, 1.
The branching offspring distribution for daughter species of trait j has mean d j /τ j , and variance (τ j −d j )/τ j . Now let us put (1)) is a function (row vector) on E. We write E j for the expected value conditional on J 0 = j, that is X 0 = (1, 0) when j = 0 and X 0 = (0, 1) when j = 1. Then X is the measure-branching process for which the function v u defined by for j ∈ E, cf. Li (2011), Ch. 4.1. It is well-known that suitably scaled measure-valued techniques in population genetics, we applied various probabilistic methods to study the effects of diversity-dependence and related properties in the species family, hence developing a phylogenetic methodology. At the core of the study is the interpretation that phylogenetic trait frequency and population genetics allele frequency are closely related from the view point of stochastic models. We showed that evolutionary time rescaling is a powerful tool to reveal relevant analogies and distinctions, and to provide a bridge between two seemingly distant application areas. We applied methods, which rely on the analysis of trajectories of solutions, to stochastic differential equations of Wright-Fisher type with diversity dependent selection and genetic drift coefficients. Our main structural result, Proposition 1, derived from an embedding argument into an abstract two-type superprocess, provides the precise dynamics of trait frequency jointly with species richness. In parallel, we also discussed the corresponding Markov generator dynamics, referred to as the Gillespie approach. In doing so, we were able to resolve a paradoxical conclusion in Gillespie (1974), dealing with the effective population size. To put these abstract results into context, we discussed a number of special cases with partly symmetric parameter settings leading up to a fully symmetric, or neutral, case. To help clarify the role of parameters, we identified trait success (or trait fitness) functions, which are further utilized for the application of the model to diversity-dependent interaction in terms of carrying capacity. A particular logistic version of the diversification-dependent model was the subject of an in-depth study of fixation and extinction of traits. The main result, Theorem 1, provides the fixation probability as function of the initial frequency of a newly injected trait, as well as a type of trait frequency spectrum, which is analogous to allele frequency spectrum in population genetics.
Funding Open access funding provided by Uppsala University.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.