Background

A key recurring question in evolutionary biology is what evolutionary novelties lead to shifts in speciation and/or extinction rates. Historically, the fossil record has been used to address this question. More recently, developments in phylogenetic comparative methods present an alternative to use for groups that have poor to nonexistent fossil records [1]. Among these developments are probabilistic models that enable researchers to infer both the evolution of the character traits responsible for diversification patterns as well as the diversification rates themselves [2, 3]. Phylogenetic comparative methods have now been directed at a variety of different taxonomic groups (e.g. flowering plants [4]; mammals [5]; birds [6]).

A limitation of phylogenetic methods is that phylogenetic trees represent only the ancestors of termini represented in the phylogeny. Reconstructing the richness of unsampled subclades is extremely difficult. An important assumption of many phylogenetic methods is that they require ultrametric trees and thus require all termini to represent extant taxa, with the result that extinct lineages within a study group are necessarily left unsampled and so become problematic [7,8,9]. In theory, entirely neontological datasets could be used to reconstruct histories of declining clades because the likelihood functions for net-negative diversification rates are well-defined. However, neontological studies from real clades that are in decline often fail to identify the net-negative diversification rates these lineages experienced in the past [8]. Many extant groups were more diverse in the past, including lineages of invertebrates (e.g. brachiopods, bryozoans, crinoids [10]), vertebrates (gar fish, crurotarsans, hominids [11]), and plants (lycopods, cupressaceous conifers, gnetophytes, sycamores [12]). The ubiquity of declining lineages suggests that such decline is a general behavior of clades, with diversifying lineages merely having yet to reach the declining phase of their history [13].

Multiple studies have examined the effects that changing diversification rates have on the usefulness of phylogenetic comparative methods in reconstructing relationships between state transitions and diversification [14, 15]. Some neontological approaches enable use of phylogenetic methods that account for some of these effects (e.g. [16] for successively-diversifying subclades, [17] for density-dependent diversification). Here we examine mechanisms that limit the usefulness of phylogenetic methods to correctly infer relationships between the character states and diversification.

In this study, we set out to address three questions concerning the behavior of the binary state speciation and extinction (BiSSE) method [2], the simplest of a family of similar (SSE) methods that use a likelihood optimization routine to compare models of speciation, extinction, and state transition given a phylogenetic tree of extant taxa. Here we use BiSSE to analyze a single character with two possible states, with speciation and extinction rates associated with each state. Although SSE methods account for diversification rate shifts as a result of changes between the character states, persistently declining diversification rates still violate the assumptions of all SSE methods. Our three questions are as follows: (1) how severely must BiSSE’s assumptions of constant speciation and extinction rates be violated before it fails to be useful when modeling the effects of the character states on diversification parameters? Under the condition where the character states have no effect on speciation rate we predicted the most likely BiSSE models would be those with higher rates of speciation in a clade’s ancestral state. This is because the frequency of a character’s ancestral state must necessarily be high early in a clade’s history, since the derived state has not had a chance to evolve yet. By the time the derived state has become common, speciation rate has declined. For this reason, the most likely BiSSE model would have the derived state associated with the decline in speciation rates, even though it is not. (2) Is BiSSE’s power to detect biases in transition rates between the character states compromised if characters evolve in a punctuated fashion (sensu [18]; see also ClaSSE, [19])? We predicted that a punctuated equilibrium simulation would result in the best BiSSE model featuring the derived state having a lower rate of reversion to the ancestral state than vice-versa. The rationale is that an asymmetry in favor of the ancestral state transitioning to the derived state reflects events early in clade history when the ancestral state is common and the clade is diversifying rapidly. (3) Do decaying, constant, or increasing extinction rates also affect how useful BiSSE is for recovering speciation rate? Using reasoning similar to the first question we predicted that decreasing extinction rates associated with both character states would result in the most likely BiSSE model having the ancestral state associated with higher extinction rates.

Results

Evolutionary history reconstruction for continuous-time evolution of a neutral trait

For simulations of a neutral character for length less than one time unit, the BiSSE model with the highest likelihood rarely includes a non-neutral character (Table 1, Fig. 1b-d). For simulation lengths greater than one, however, the best model frequently does have a non-neutral character. The simulation length threshold after which the most likely BiSSE model has the character being non-neutral does depend on whether μ is changing as well as λ, or if μ is constant and only λ is time-variant.

Table 1 Proportions of simulations in which the most likely BiSSE model included character states affecting evolutionary rate when no such effect existed.
Fig. 1
figure 1

Risk of misinterpreting BiSSE results to incorrectly infer evolutionary rate asymmetry on simulated trees when no asymmetry exists. Simulation length is the amount of time the simulation ran to generate the simulated clade, with one unit being roughly the time necessary for the λ to drop below the μ (see Fig. 3). a, b λ and μ both decline, but at different rates such that μ eventually overtakes λ, as in Fig. 3a. c λ declines while μ is constant, so that eventually extinction dominates, as in Fig. 3b. d λ decreases while μ increases, as in Fig. 3c. The relative rates of speciation and extinction are the same as those described in the corresponding graph in Fig. 3. a Character evolution follows a punctuated equilibrium simulation, whereby character evolution only occurs during speciation events. b, c, d Character evolution is continuous with time and proceeds irrespective of speciation. Misinference rate represents the results of an AIC-based likelihood ratio test on BiSSE model fits in which the character states had or did not have an effect on \( \widehat{\uplambda} \), \( \widehat{\upmu} \), or \( \widehat{\mathrm{q}} \), respectively. Each combination of parameters and simulation length represents 100 simulated trees. The blue bars represent differences in \( \widehat{\uplambda} \), the red bars \( \widehat{\upmu} \), and the green bars \( \widehat{\mathrm{q}} \). Error bars represent the 95% confidence limits on the actual rate of misinference based on the size of the sample. In these simulations, the true evolutionary rates associated with ancestral and derived states are equal. Abbreviations of rates and symbols are in Table 5

When the most likely BiSSE model allows \( \widehat{\uplambda} \)0\( \widehat{\uplambda} \)1, this asymmetry is not always \( \widehat{\uplambda} \)0 > \( \widehat{\uplambda} \)1 (Table 2). For simulations in which the best model includes speciation rate asymmetry, Table 3 shows only those runs on which the likelihood ratio test identified the difference between the model in which \( \widehat{\uplambda} \)0\( \widehat{\uplambda} \)1 is better than the model in which \( \widehat{\uplambda} \)0 = \( \widehat{\uplambda} \)1 is statistically significant at the 0.05 level. Excluding simulations for which the difference between models is not significant, \( \widehat{\uplambda} \)0 > \( \widehat{\uplambda} \)1 is more common for simulations greater than length one in both our punctuated equilibrium simulation and in our constant μ simulation. The trend of \( \widehat{\uplambda} \)0 > \( \widehat{\uplambda} \)1 occurring more often with increasing clade age does not occur in our continuous time models in which μ is not constant, however (Table 3).

Table 2 Estimates of systematic bias in directionality of differences in evolutionary rates of most likely BiSSE models when no real (simulated) differences in evolutionary rates exist
Table 3 Estimates of systematic bias in directionality of differences in evolutionary rates of most likely BiSSE models when no real (simulated) differences in evolutionary rates exist.

In those simulations in which the best model has \( \widehat{\uplambda} \)0\( \widehat{\uplambda} \)1, this model tends to also have \( \widehat{\upmu} \)0\( \widehat{\upmu} \)1, as well (Table 4). Of the 315 runs in which the best model had \( \widehat{\upmu} \)0\( \widehat{\upmu} \)1, 71 did not provide a better fit for \( \widehat{\uplambda} \)0\( \widehat{\uplambda} \)1. Of the total 1800 runs, 468 had better fit for \( \widehat{\uplambda} \)0\( \widehat{\uplambda} \)1. The most likely BiSSE model is far more likely to include rate asymmetry in both \( \widehat{\uplambda} \) and \( \widehat{\upmu} \) than would be expected by chance alone (Fisher’s exact test P = 0.0001; Table 4).

Table 4 Proportions of simulations in which the most likely BiSSE model included asymmetry in \( \widehat{\uplambda} \) versus simulations in which the most likely model included asymmetry in \( \widehat{\upmu} \)

The most likely BiSSE models often estimate \( \widehat{\upmu} \) to be zero in a number of simulations, especially of length 0.5 and lower (Table 2). However, not a single simulated clade has more than 56% of the species generated survive. Concluding that extinction is negligible because the best model includes \( \widehat{\upmu} \) = 0 would be incorrect.

Evolutionary history reconstruction for punctuated evolution of a neutral character

In general, the results of our punctuated equilibrium simulation and our continuous evolution simulation are similar, but there are some details in which they differ. In clades created using our punctuated equilibrium simulation, the most likely BiSSE models often include non-neutral characters in both old (length > 1) and very young (length = 0.25) clades, but not for intermediate age clades (0.5 and 0.75) (Table 1, Fig. 1a). Additionally, in punctuated equilibrium simulations, the most likely model is far more likely to include \( \widehat{\upmu} \) = 0 than continuous evolution simulations (Table 2).

Reconstruction for both continuous-time and punctuated evolution of a non-neutral character

In all simulations where λ0 and λ1 are unequal, the most likely BiSSE model is increasingly likely to identify \( \widehat{\uplambda} \)0\( \widehat{\uplambda} \)1 as clade age increases (Fig. 2). The most likely model also more often has \( \widehat{\uplambda} \)0\( \widehat{\uplambda} \)1 when λ10 is 1.33 instead of 1.05. The probability that the most likely model correctly has \( \widehat{\uplambda} \)1 > \( \widehat{\uplambda} \)0 increases even as clades reach their peak richness and go into decline. We do not observe a significant difference between simulations running a continuous-time or punctuated model of character evolution.

Fig. 2
figure 2

Ability of BiSSE fits to enable detection of speciation rate asymmetry on simulated trees when asymmetry does exist (not \( \widehat{\upmu} \) or \( \widehat{\mathrm{q}} \), unlike Fig. 1). In (a), λ1 is 1.33 times λ0 at any given point in time, but both decline exponentially with time as in Fig. 3a. Moreover, changes between character states occur only during speciation events, as predicted by punctuated equilibrium. In (b), λ1 is 1.05 times λ0, and changes between character states occur only during speciation events, as predicted by punctuated equilibrium. In (c), λ1 is 1.33 times λ0, but character evolution occurs independently of speciation. The yellow bar represents the proportion of runs in which the most likely BiSSE model did not include a difference in λ values, even when one existed. The red bar signifies runs in which the best model included asymmetry in \( \widehat{\uplambda} \), but in the wrong direction, estimating \( \widehat{\uplambda} \)0 to be greater than \( \widehat{\uplambda} \)1. The blue bar indicates that that the best model is the most correct, incorporating the real difference between λ0 and λ and found a difference of the correct sign. All three sets of bars refer to estimation speciation rates. Proportion of runs represents the proportion (out of 100 runs) that each result for simulation length. Length of simulation is scaled such that one unit is approximately the time required for speciation rate to drop below extinction rate (see Fig. 3a). Error bars represent the 95% confidence limits on the actual frequency of the particular type of estimation being represented. Assessment of statistical significance is conducted using a likelihood ratio test

When λ10 is 1.05, we also detect the bias observed when the simulated character is neutral: the most likely BiSSE model has \( \widehat{\uplambda} \)0 > \( \widehat{\uplambda} \)1, the opposite of what we simulated. As with the simulations with a neutral character, this effect is not detect the p = 0.05 level using our likelihood ratio test on length 1 simulations and shorter, but becomes observable above length 1. This effect is much stronger in simulations where λ10 = 1.05 than in those with λ10 = 1.33, with the former having 31 out of the 100 runs of length two exhibiting this type of misleading model fit, but only four out of 100 for the latter.

Discussion

As we predicted, when applied to declining clades the most likely BiSSE model often has the character being non-neutral even when the simulated character is neutral. Although BiSSE models more often have better support for \( \widehat{\uplambda} \)1 > \( \widehat{\uplambda} \)0 (Table 2), the asymmetry of the most likely model sometimes lies in the other direction. This bias may exist because the relatively simple models that BiSSE is using cannot simulate the more complicated ways in which these rates are changing. Since both character states have become common by the end of the simulation, it is possible that the particular configuration of the tree is better approximated by a model with \( \widehat{\uplambda} \)0 > \( \widehat{\uplambda} \)1, even though BiSSE models cannot accurately approximate the processes underlying the pattern of the tree. This is a similar phenomenon to that observed by Rabosky & Goldberg [14], but in our case the ‘hidden character’ is a property of clade evolution and not a second binary character as would be modeled by HiSSE [20].

We predicted that our punctuated equilibrium simulation would demonstrate \( \widehat{\mathrm{q}} \) being subject to the same biases as \( \widehat{\uplambda} \). We did observe this, but we observed that \( \widehat{\mathrm{q}} \) estimates are biased in our continuous-time character evolution model as well (Table 1, Fig. 1). As mentioned above, we suspect that the limited selection of models that BiSSE can fit is forcing character states to coevolve with the declining diversification rates because a model featuring changing diversification rates in any other way is not available.

Time-variance in μ has subtle effects on the overall trends we observe in this study, but we identify no clear patterns. In simulations of anagenetic character evolution of a neutral character, constant μ results in the best BiSSE model having a non-neutral character even in length-1 simulations (Table 1, Fig. 1), as clades are just reaching peak diversity. However, in the other simulations it is not until after length-1 that the most likely BiSSE models include non-neutral characters when simulated characters are neutral. In contrast, we predicted that BiSSE would be more biased with increasing μ instead of constant μ. An explanation for this behavior of BiSSE is not apparent.

We observed a number of other behaviors of BiSSE that we did not anticipate. For simulations in which the most likely BiSSE model includes asymmetry in \( \widehat{\uplambda} \), it is also more likely to include asymmetry in \( \widehat{\upmu} \). This is probably because the stochastic variation that creates the simulated phylogenetic trees makes for some trees that more severely violate BiSSE’s assumptions than others. More worrying is BiSSE’s tendency to fit higher likelihood scores for models featuring unequal \( \widehat{\uplambda} \) and \( \widehat{\upmu} \) in very short simulations that use the punctuated equilibrium model (Table 1, Fig. 1a). This is of concern for two reasons: first, punctuated equilibrium is commonly observed in fossil data [18, 21], meaning that cladogenic character evolution is a possibility that systematic biologists must be prepared for. Second, workers are likely to apply BiSSE to young, rapidly diversifying clades (e.g. nightshades [22]; color-varying birds [6]). We suggest that the reason the most likely models of young clades have asymmetrical speciation rates is because of stochastic variation: some clades get “lucky” and have an early burst of diversification that by chance is associated with one or the other character. Thus the BiSSE algorithm provides a better fit for a highly incorrect model with a non-neutral character than for a less incorrect model. Our minimum cutoff of 36 species in the punctuated equilibrium model would exacerbate this acquisition bias toward clades with such early bursts of diversification, and it is possible that we would not observe it were our cutoff lower. Unfortunately, the reasoning by which evolutionary biologists select clades of study has similar biases and problems.

In contrast to what Machac [15] observed with QuaSSE, increasing clade age does not decrease the usefulness of BiSSE’s model fitting to detect real asymmetry in λ. This is not a function of the older clades having more species, because the BiSSE model with the greatest likelihood is more likely to include asymmetry even as clades are in decline (Fig. 2). We suggest that this is because BiSSE is computationally simpler than QuaSSE, and has more statistical power to detect rate asymmetry. As expected, if the asymmetry in λ is small, the fit for models with \( \widehat{\uplambda} \)0\( \widehat{\uplambda} \)1 is worse than when the real asymmetry is large. Moreover, in older clades the bias that causes BiSSE models to have better fits for \( \widehat{\uplambda} \)0 > \( \widehat{\uplambda} \)1 can overpower the improvement in model fit reflecting actual asymmetry in the other direction. If λ10 is large, real rate asymmetry overpowers the bias.

There is an alarming tendency for the most likely BiSSE model to have \( \widehat{\upmu} \)=0 for both character states. The probability of this happening is greater in longer runs or in punctuated equilibrium scenarios. These parameter estimates from the most likely BiSSE models are misleading because no more than 56% of the total species survived to the end in any of our simulations. Extinction is pervasive in the fossil record, a fact that is sometimes overlooked by studies focused on modern data (see [23] for discussion). Workers not aware of this might use BiSSE and misinterpret the results as genuinely signifying that a clade has zero extinction in its history.

We propose that BiSSE is best suited for study of clades of intermediate age, which have already undergone their initial pulse of rapid diversification but have not yet reached peak diversity. BiSSE is more likely to enable researchers to recover true rate asymmetry than constant rate estimators [15], and its ability to fit (relatively) correct models featuring true rate asymmetry improves with clade age even as assumptions of positive diversification are violated. Predictably, the larger the asymmetry in evolutionary rates, the more useful BiSSE is at detecting it. Unfortunately, in old, declining clades, as well as very young clades, the most likely BiSSE models are prone to feature rate asymmetry even if there is none. This makes clade selection an important consideration when using BiSSE.

At present, state speciation and extinction models such as BiSSE, QuaSSE, and their cousins are the only phylogenetic methods capable of assessing the effects of the character states on diversification rate as well as directionality in character evolution. Unfortunately, available SSE models are unable to make use of fossil data, generally requiring ultrametric trees. Non-SSE models such as BioGeoBEARS [24] do use fossil data for ancestral reconstructions. We suggest that future development of phylogenetic methods focus on the ability to incorporate fossil data. Presently, BiSSE and similar methods are useful tools in an evolutionary biologist’s arsenal, but to use them exclusively will lead to systematic bias in assessing evolutionary history. The ability of phylogenetic methods to accurately reconstruct past diversification patters is improving, but as we demonstrate there remain numerous situations where their use may be compromised [14, 15].

Conclusions

BiSSE is moderately robust to the violation of the assumptions we investigate; a clade must be in decline for it to have a high probability of generating misleading results, unlike QuaSSE [15]. Furthermore, the most likely BiSSE models often reflect real asymmetries in diversification rate associated with a character but BiSSE analyses are prone to be misleading during clade decline due to violations of its assumptions. BiSSE’s robustness in recovering differences in diversification between character states increases with greater asymmetry, but fairly modest asymmetries (e.g. a factor of 1.33) can still be detected reliably, even in clades that are in decline for part of their history.

Methods

Simulating clade diversification

To do these analyses we conducted birth-death simulations of diversification in which rate of origination (λ) and extinction (μ) change over time (see Table 5). The code for this program is contained in Additional file 1 and was run on a Unix (Mac OSX) operating system. In all of our simulations, λ starts out high and then decays exponentially through time. We ran simulations under a variety of conditions, featuring exponentially decreasing (Fig. 3a), constant (Fig. 3b), or exponentially increasing (Fig. 3c) extinction rates (Table 6). Even in the case of decreasing extinction, the rate at which extinction decays is slower than the rate at which speciation decays, with the result that speciation eventually drops below extinction, and the clade begins to lose diversity.

Table 5 List of abbreviations used referring to evolutionary rate parameters
Fig. 3
figure 3

Graphical depiction of speciation and extinction simulation. The blue line represents λ, and the red line represents μ (see Table 5). a λ and μ begin high, then decay exponentially through time, with the slower rate of decay in μ eventually leading to extinction outpacing λ. b λ begins high and decays exponentially, while μ remains constant. c λ begins high and decays, while μ begins low and increases. λ is scaled to 1 and μ to 0.5, representing that our simulations had an initial μ rate half that of λ. For trials in which the λ1 was greater than λ0, the multiplier for the derived state was in addition to that shown here; thus, λ1 began at 1.05 or 1.33 in these simulations, rather than 1. Time is scaled such that one unit represents approximately the amount of time necessary for the two rates to reach equal levels; before one time unit has passed, a clade is still diversifying, but afterwards, it is declining. For each set of state parameters, we ran 100 trials with each that ending at time t = 0.25, 0.5, 0.75, 1, 1.5, and 2. In real-world taxa, the exact length of time that one unit corresponds to, as well as the relative decay constants, varies among groups [10]

Table 6 Schematic of the data analyses. Test type refers to whether the simulation was run to test for the propensity of BiSSE to indicate a most likely model with higher diversification rates in the ancestral state when the character has no effect (“asymmetry bias”) or to test the propensity of BiSSE to indicate a most likely model with higher diversification rates in the derived state when the character does have an effect (“Asymmetry detection”).

The simulated phylogenetic trees were analyzed with BiSSE, which takes a tree with branch lengths and character states mapped onto the tips and fits models across the tree using maximum likelihood [2]. Because both our computer simulations and BiSSE are model-based, we make the following distinction for the sake of clarity: in this paper, we use the word “simulation” to refer to the simulated phylogeny incorporating different diversification parameters, the word “model” to refer to likelihood models fit using the BiSSE method, and the word “run” to represent an individual experiment from simulation to model-fitting. BiSSE models have six parameters: rate of origination (λ), extinction (μ), and state transition (q), each of which have values for derived (1) and ancestral (0) character states (Table 5). We use the term “neutral” to describe simulated characters that do not affect evolutionary rates in any way (i.e. λ0 = λ1, μ0 = μ1, q01 = q10). We likewise use the term “non-neutral” to describe a character that is not neutral by the above definition.

We conducted two sets of analyses, the first to determine if the BiSSE model with the highest likelihood includes a non-neutral character when the simulated character is neutral (e.g., λ0 = λ1 in the simulation, but the most likely parameter estimates from the BiSSE analyses have \( \widehat{\uplambda} \)0\( \widehat{\uplambda} \)1, and likewise for μ and q). The second group of runs was to determine if declining speciation rate results in the most likely BiSSE model having the character being neutral when the character in the simulation is non-neutral (i.e. λ0 ≠ λ1, but \( \widehat{\uplambda} \)0 = \( \widehat{\uplambda} \)1). For this second set of analyses, we simulated two different ratios of λ1 to λ0, one in which λ1 = 1.33 x λ0, and one in which λ1 = 1.05 x λ0.

We began all simulations with extinction rate at half the speciation rate, and defined a single time unit as the time required for speciation and extinction rate to become equal (the point of peak richness). For real clades in the fossil record, the number of years to peak richness varies widely [10]. In our shortest runs (ended at 0.25 time units), speciation rate still well exceeds extinction rate and the clade is rapidly diversifying. At the end of the longest (two time units) simulations, speciation rate is half that of extinction rate and the clade is declining.

Our speciation rate asymmetries between λ0 and λ1 for non-neutral characters are smaller than those that Davis et al. [25] investigated. We did this for the following reason: at time 0, λ0 is twice μ0. At time 2, λ0 is half μ0. Because λ1 is a flat multiple of λ0, increasing the ratio of the two rates to 2:1 or beyond would result in species with the derived state continuing to have net positive diversification rates even at the end of our longest runs. We therefore used smaller, but still paleontologically realistic (e.g. [10]) values of 1.33 and 1.05 as the ratio of λ1 to λ0. Additionally, this real rate asymmetry lies in the opposite direction (i.e. λ1 > λ0) from what we predicted the most likely BiSSE model to be (\( \widehat{\uplambda} \)0 > \( \widehat{\uplambda} \)1). We did this because we hoped that the direction of the rate asymmetry in the best model would enable us to distinguish between a model-fitting bias created by our simulations’ violation of BiSSE’s assumptions as opposed to detection of real asymmetry created by our simulated non-neutral character.

We ran 100 simulations with each combination of parameters. Simulations that had fewer than 50 surviving species were re-run until the simulation had a total of 50 surviving species (36 for our punctuated equilibrium simulation – see below). Thus, each combination of parameters had a total of 100 simulations for which there was at least the minimum quota of species, and in most of our runs the number of surviving species is considerably larger (Fig. 4). We excluded simulation runs with few surviving species because Davis et al. [25] demonstrated that the statistical power to distinguish among BiSSE models is reduced when examining small clades. By setting this arbitrary cutoff, we introduce a significant acquisition bias to our study. This bias also exists in real clades, however, as workers tend to analyze large clades and especially those that have an imbalance in richness among species bearing specific traits. Because this acquisition bias exists in analyses of real clades, we decided not to correct for it, as doing so would have been both methodologically difficult and would introduce yet additional biases.

Fig. 4
figure 4

Number of species surviving to end of simulation as a function of length of our simulation. The dark lines are the median number of surviving species, the boxes the first and third quartiles, and the whiskers the maximum and minimum

Simulating morphological evolution

We ran two types of simulation of character evolution in the context of our declining speciation and extinction rates. The first (“continuous-time”) type of simulation allows the character state of a species to change at any time, not just at speciation events, and thus emulates anagenetic character evolution. We set the values of q01 and q10 to 0.6 changes per time unit. This number matches the state transition rates typical of the punctuated equilibrium simulation described below. In the continuous simulation, q10 and q01 are time-invariant, irrespective of how λ and μ may be changing. The expected number of changes along the line of any surviving species from the root of the tree to the tip is thus the same regardless of how many branching events took place. This simulation is implicit in the assumptions of BiSSE as implemented in diversitree [26], and is equivalent to “phyletic gradualism” sensu Eldredge and Gould [18].

The second (“punctuated”) type of simulation emulates cladogenetic character evolution in which state changes happen only at speciation events. As a consequence, q01 and λ0 are linked and change together, as are q10 and λ1. We used a probability of state change (in either direction) of 0.09 per branching event, which is typical of minimum steps parsimony-based phylogenetic reconstructions of fossil taxa [27]. The expected number of total state changes along any branch thus depends on the number of branching events, and is independent of time required for those branching events to occur. The shortest simulations yield a geometric mean of 4.7 changes and the longest 15.2 changes, but because of extinction, not all of these changes appear in terminal taxa. This punctuated equilibrium simulation represents yet another violation of BiSSE’s assumptions in addition to time-variant speciation and extinction rates. Newer models, such as ClaSSE [19] do allow for punctuated equilibrium scenarios. However, we chose to use BiSSE because we sought to quantify whether anagenetic and cladogenetic forms of character evolution impact the biases that we here investigate.

Assessment

We conducted BiSSE analyses of our simulated clades using the diversitree package in R [26]. BiSSE models in diversitree are compared using likelihood ratio tests in the ANOVA function in R to assess the significance level of asymmetries between \( \widehat{\uplambda} \)0 versus \( \widehat{\uplambda} \)1, \( \widehat{\upmu} \)0 versus \( \widehat{\upmu} \)1, and \( \widehat{\mathrm{q}} \)01 versus \( \widehat{\mathrm{q}} \)10. The null hypothesis of these likelihood ratio tests is that the character is neutral (i.e. \( \widehat{\uplambda} \)0 = \( \widehat{\uplambda} \)1, \( \widehat{\upmu} \)0 = \( \widehat{\upmu} \)1, \( \widehat{\mathrm{q}} \)01 = \( \widehat{\mathrm{q}} \)10). This is different from the null hypothesis of BiSSE itself, which is that there is no variation in evolutionary rates across the phylogenetic tree, which we violate already by the time-variance of speciation and extinction rates in our simulation. Thus, for the sake of clarity, we do not discuss these errors as type-I and type-II in this paper. Because of our continually decreasing speciation rates, there is no BiSSE model that correctly represents our simulated phylogeny, but some BiSSE models are more incorrect than others. We assessed the statistical significance of the frequency by which the best BiSSE model included a neutral or non-neutral character using these likelihood ratio tests (e.g., if λ0 = λ1, how often does AIC indicate that \( \widehat{\uplambda} \)0 = \( \widehat{\uplambda} \)1 is the best model and how often does AIC imply that \( \widehat{\uplambda} \)0\( \widehat{\uplambda} \)1 is a better model?).