Abstract
In this contribution, a general expression is derived for the probability density of the time to the most recent common ancestor (TMRCA) of a simple birth–death tree, a widely used stochastic null-model of biological speciation and extinction, conditioned on the constant birth and death rates and number of extant lineages. This density is contrasted with a previous result which was obtained using a uniform prior for the time of origin. The new distribution is applied to two problems of phylogenetic interest. First, that of the probability of the number of taxa existing at any time in the past in a tree of a known number of extant species, and given birth and death rates, and second, that of determining the TMRCA of two randomly selected taxa in an unobserved tree that is produced by a simple birth-only, or Yule, process. In the latter case, it is assumed that only the rate of bifurcation (speciation) and the size, or number of tips, are known. This is shown to lead to a closed-form analytical expression for the probability distribution of this parameter, which is arrived at based on the known mathematical form of the age distribution of Yule trees of a given size and branching rate, which is derived here de novo, and a similar distribution which additionally is conditioned on tree age. The new distribution is the exact Yule prior for divergence times of pairs of taxa under the stated conditions and is potentially useful in statistical (Bayesian) inference studies of phylogenies.
Similar content being viewed by others
References
Bailey NTJ (1964) The elements of stochastic processes with applications to the natural sciences. Wiley, New York
Bartoszek K, Sagitov S (2015) A consistent estimator of the evolutionary rate. J Theor Biol 371:69–78
Crawford FW, Suchard M (2013) Diversity, disparity, and evolutionary rate estimation for unresolved Yule trees. Syst Biol 62:439–455
Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7(1):214–221
Felsenstein J (2004) Inferring phylogenies. Sunderland (Mass.): Sinauer Assoc
Gernhard T (2008) The conditioned reconstructed process. J Theor Biol 253(4):769–778
Gernhard T, Hartmann K, Steel M (2008) Stochastic properties of generalised Yule models, with biodiversity applications. J Math Biol 57(5):713–735
Heled J, Drummond AJ (2011) Calibrated tree priors for relaxed phylogenetics and divergence time estimation. Syst Biol 61(1):138–149
Ignatieva A, Hein J, Jenkins PA (2020) A characterisation of the reconstructed birth-death process through time-scaling. Theor Popul Biol 134:61–76
Kendall DG (1948) On the generalized “birth-and-death” process. Ann Math Stat 19:1–15
Mulder WH (2011) Probability distributions of ancestries and genealogical distances on stochastically generated rooted binary trees. J Theor Biol 280(1):139–145 (Addendum: J Theor Biol 314 (2012): 216–217)
Mulder WH, Crawford FW (2015) On the distribution of interspecies correlation for Markov models of character evolution on Yule trees. J Theor Biol 364:275–283
Nee S (2006) Birth-death models in macroevolution. Ann Rev Ecol Evol, Syst 37:1–17
Nee S, May RM, Harvey PH (1994) The reconstructed evolutionary process. Philos Trans R Soc Ser B Biol Sci 344(1309):305–311
Rannala B, Yang Z (1996) Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J Mol Evol 43(3):304–311
Rosenberg NA (2006) The mean and variance of the numbers of r-pronged nodes and r-caterpillars in Yule-generated genealogical trees. Ann Combin 10(1):129–146
Rosenberg NA, Feldman MW (2002) The relationship between coalescence times and species divergence times. In: Slatkin M, Veuille M (eds) Modern developments in theoretical population genetics, vol 9. Oxford University Press, Oxford, pp 130–164
Sheinman M, Massip F, Arndt PF (2015) Statistical properties of pairwise distances between leaves on a random Yule tree. PLoS ONE 10(3):e0120206
Stadler T (2009) On incomplete sampling under birth and death models and connections to the sampling-based coalescent. J Theor Biol 261(1):58–66
Stadler T (2010) Sampling-through-time in birth-death trees. J Theor Biol 267:396–404
Stadler T, Steel M (2012) Distribution of branch lengths and phylogenetic diversity under homogeneous speciation models. J Theor Biol 297:33–40
Steel M, McKenzie A (2001) Properties of phylogenetic trees generated by Yule-type speciation models. Math Biosci 170:91–112
Steel M, Mooers A (2010) The expected length of pendant and interior edges of a Yule tree. Appl Math Lett 23(11):1315–1319
Yule GU (1924) A mathematical theory of evolution, based on the conclusions of Dr. J.C. Willis, FRS. Philos Trans R Soc Lond B 213:21–87
Acknowledgements
I thank the two anonymous reviewers for their helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Comparison with Age Distribution Based on an Improper Uniform Prior for the Time of Origin
Appendix: Comparison with Age Distribution Based on an Improper Uniform Prior for the Time of Origin
In a previous study on the density of the times of speciation events, Gernhard (2008) considers the distribution of the time of origin of a BD tree having a known number of descendants at the present time. This earlier work takes a somewhat different approach where it is assumed that the moment at which a tree of unknown size and birth and death rates emerged is entirely unknown and is equally likely to have occurred at any time in the past. This amounts to assuming the age τ of any tree to follow an improper uniform distribution on [0, ∞) which is thus taken to be the prior (for a more recent application of this model assumption, see Ignatieva et al 2020). If the parameters n, λ and μ are known, application of Bayes’ theorem gives rise to the posterior distribution for tree age density conditioned on n, λ, μ which, in the notation used by Gernhard (2008; theorem 3.2), is found to be
In this case, the tree starts with a single lineage which may split after some time.
This result cannot be compared directly with Eq. (8) of the present study which defines the time of origin of a BD tree as that of the first bifurcation. To make a comparison between the two approaches requires a slight modification of the argument presented in subSect. 2.1, where we now interpret the MRCA of the subtree highlighted in Fig. 1 to mark the “birth” of either one of its daughter trees.
Thus, the probability that a tree that starts with a single lineage at time T – τ will have grown to size n at time T is proportional to
After normalising with \(2\sum\nolimits_{n = 0}^{\infty } {p_{n} (\tau )} = 2\), this probability is found to be simply equal to \(p_{n} (\tau )\), which should then replace \(\hat{p}_{n} (\tau )\) in Eq. (3). Otherwise, the procedure for calculating the new distribution, which will be denoted as \(\tilde{P}(\tau |n,\lambda ,\mu )\), is exactly the same as that followed in deriving Eq. (8).
The analogue of Eq. (5) is
where \(\tilde{C}_{n} (T,\lambda ,\mu )\) is the normalisation factor which, in the limit T → ∞, follows from
With tree age τ thus redefined, its distribution now becomes
which differs from Eq. (A.1) by an extra factor \(e^{ - (\lambda - \mu )\tau }\) (and hence, the normalisation constant will also be different). It should be noted that this is precisely the factor by which the average total population from which the clade is sampled would have shrunk when returning to the point where the MRCA emerged.
That the results are not identical is therefore not surprising based on the different perspectives, viz. a uniform prior on time of origin vs. a picture based on a subtree embedded in an exponentially increasing population (assuming λ and μ known).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mulder, W.H. Probability Distribution of Tree Age for the Simple Birth–Death Process, with Applications to Distributions of Number of Ancestral Lineages and Divergence Times for Pairs of Taxa in a Yule Tree. Bull Math Biol 85, 94 (2023). https://doi.org/10.1007/s11538-023-01196-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11538-023-01196-7