Abstract
We consider some genealogical properties of nested samples. The complete sample is assumed to have been drawn from a natural population characterised by high fecundity and sweepstakes reproduction (abbreviated HFSR). The random gene genealogies of the samples are—due to our assumption of HFSR—modelled by coalescent processes which admit multiple mergers of ancestral lineages looking back in time. Among the genealogical properties we consider are the probability that the most recent common ancestor is shared between the complete sample and the subsample nested within the complete sample; we also compare the lengths of ‘internal’ branches of nested genealogies between different coalescent processes. The results indicate how ‘informative’ a subsample is about the properties of the larger complete sample, how much information is gained by increasing the sample size, and how the ‘informativeness’ of the subsample varies between different coalescent processes.
Similar content being viewed by others
References
Agrios, G.: Plant Pathology. Academic Press, Amsterdam (2005)
Árnason, E., Halldórsdóttir, K.: Nucleotide variation and balancing selection at the Ckma gene in Atlantic cod: analysis with multiple merger coalescent models. PeerJ 3, e786 (2015). https://doi.org/10.7717/peerj.786
Arratia, R., Barbour, A.D., Tavaré, S.: Logarithmic Combinatorial Structures: A Probabilistic Approach. European Mathematical Society (EMS), Zürich (2003)
Barney, B.T., Munkholm, C., Walt, D.R., Palumbi, S.R.: Highly localized divergence within supergenes in atlantic cod (gadus morhua) within the gulf of maine. BMC Genomics 18(1) (2017). https://doi.org/10.1186/s12864-017-3660-3
Barton, N.H., Etheridge, A.M., Véber, A.: Modelling evolution in a spatial continuum. J. Stat. Mech. 2013(01), P01,002 (2013). http://stacks.iop.org/1742-5468/2013/i=01/a=P01002
Basu, A., Majumder, P.P.: A comparison of two popular statistical methods for estimating the time to most recent common ancestor (tmrca) from a sample of DNA sequences. J. Genet. 82(1–2), 7–12 (2003)
Berestycki, J., Berestycki, N., Schweinsberg, J.: Beta-coalescents and continuous stable random trees. Ann. Probab. 35, 1835–1887 (2007)
Berestycki, J., Berestycki, N., Schweinsberg, J.: Small-time behavior of beta coalescents. Ann. Inst. H Poincaré Probab. Stat. 44, 214–238 (2008)
Berestycki, N.: Recent progress in coalescent theory. Ensaios Mathématicos 16, 1–193 (2009)
Bertoin, J.: Exchangeable coalescents. Cours d’école doctorale, pp. 20–24 (2010)
Bhaskar, A., Clark, A., Song, Y.: Distortion of genealogical properties when the sample size is very large. PNAS 111, 2385–2390 (2014)
Birkner, M., Blath, J.: Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. J. Math. Biol. 57, 435–465 (2008)
Birkner, M., Blath, J.: Coalescents and population genetic inference. Trends Stoch. Anal. 353, 329 (2009)
Birkner, M., Blath, J., Capaldo, M., Etheridge, A.M., Möhle, M., Schweinsberg, J., Wakolbinger, A.: Alpha-stable branching and beta-coalescents. Electron. J. Probab. 10, 303–325 (2005)
Birkner, M., Blath, J., Eldon, B.: An ancestral recombination graph for diploid populations with skewed offspring distribution. Genetics 193, 255–290 (2013)
Birkner, M., Blath, J., Eldon, B.: Statistical properties of the site-frequency spectrum associated with \(\varLambda \)-coalescents. Genetics 195, 1037–1053 (2013)
Birkner, M., Blath, J., Möhle, M., Steinrücken, M., Tams, J.: A modified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks. ALEA Lat. Am. J. Probab. Math. Stat. 6, 25–61 (2009)
Birkner, M., Blath, J., Steinrücken, M.: Analysis of DNA sequence variation within marine species using Beta-coalescents. Theor. Popul. Biol. 87, 15–24 (2013)
Blath, J., Cronjäger, M.C., Eldon, B., Hammer, M.: The site-frequency spectrum associated with \(\varXi \)-coalescents. Theor. Popul. Biol. 110, 36–50 (2016). https://doi.org/10.1016/j.tpb.2016.04.002
Bolthausen, E., Sznitman, A.: On Ruelle’s probability cascades and an abstract cavity method. Commun. Math. Phys. 197, 247–276 (1998)
Capra, J.A., Stolzer, M., Durand, D., Pollard, K.S.: How old is my gene? Trends Genet. 29(11), 659–668 (2013)
Desai, M.M., Walczak, A.M., Fisher, D.S.: Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics 193(2), 565–585 (2013)
Dong, R., Gnedin, A., Pitman, J.: Exchangeable partitions derived from markovian coalescents. Ann. Appl. Probab. 17, 1172–1201 (2007)
Donnelly, P., Kurtz, T.G.: Particle representations for measure-valued population models. Ann. Probab. 27, 166–205 (1999)
Donnelly, P., Tavare, S.: Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29(1), 401–421 (1995)
Durrett, R.: Probability Models for DNA Sequence Evolution, 2nd edn. Springer, New York (2008)
Durrett, R., Schweinsberg, J.: Approximating selective sweeps. Theor. Popul. Biol. 66, 129–138 (2004)
Durrett, R., Schweinsberg, J.: A coalescent model for the effect of advantageous mutations on the genealogy of a population. Stoch. Proc. Appl. 115, 1628–1657 (2005)
Eldon, B.: Inference methods for multiple merger coalescents. In: Pontarotti, P. (ed.) Evolutionary Biology: Convergent Evolution, Evolution of Complex Traits, Concepts and Methods, pp. 347–371. Springer, New York (2016)
Eldon, B., Birkner, M., Blath, J., Freund, F.: Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents. Genetics 199, 841–856 (2015)
Eldon, B., Wakeley, J.: Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics 172, 2621–2633 (2006)
Eldon, B., Wakeley, J.: Linkage disequilibrium under skewed offspring distribution among individuals in a population. Genetics 178, 1517–1532 (2008)
Etheridge, A.: Some Mathematical Models from Population Genetics. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-16632-7
Etheridge, A., Griffiths, R.: A coalescent dual process in a Moran model with genic selection. Theor. Popul. Biol. 75, 320–330 (2009)
Etheridge, A.M., Griffiths, R.C., Taylor, J.E.: A coalescent dual process in a Moran model with genic selection, and the Lambda coalescent limit. Theor. Popul. Biol. 78, 77–92 (2010)
Ewens, W.J.: Mathematical Population Genetics 1: Theoretical Introduction, vol. 27. Springer, New York (2012)
Freund, F., Möhle, M.: On the size of the block of 1 for \(\varXi \)-coalescents with dust. Modern Stoch. Theory Appl. 4(4), 407–425 (2017). https://doi.org/10.15559/17-VMSTA92
Freund, F., Siri-Jégousse, A.: Minimal clade size in the bolthausen-sznitman coalescent. J. Appl. Probab. 51(3), 657–668 (2014)
Goldschmidt, C., Martin, J.B.: Random recursive trees and the Bolthausen-Sznitman coalescent. Electron. J. Probab. 10(21), 718–745 (2005)
Griffiths, R.C., Tavare, S.: Monte carlo inference methods in population genetics. Math. Comput. Model. 23(8–9), 141–158 (1996)
Griffiths, R.C., Tavaré, S.: The age of a mutation in a general coalescent tree. Commun. Stat. Stoch. Model. 14, 273–295 (1998)
Griswold, C.K., Baker, A.J.: Time to the most recent common ancestor and divergence times of populations of common chaffinches (Fringilla coelebs) in Europe and North Africa: insights into Pleistocene refugia and current levels of migration. Evolution 56(1), 143–153 (2002)
Halldórsdóttir, K., Árnason, E.: Whole-genome sequencing uncovers cryptic and hybrid species among Atlantic and Pacific cod-fish (2015). https://www.biorxiv.org/content/early/2015/12/20/034926
Hintze, J.L., Nelson, R.D.: Violin plots: a box plot-density trace synergism. Am. Stat. 52(2), 181–184 (1998). https://doi.org/10.1080/00031305.1998.10480559
Hedgecock, D.: Does variance in reproductive success limit effective population sizes of marine organisms? In: Beaumont, A. (ed.) Genetics and Evolution of Aquatic Organisms, pp. 1222–1344. Chapman and Hall, London (1994)
Hedgecock, D., Pudovkin, A.I.: Sweepstakes reproductive success in highly fecund marine fish and shellfish: a review and commentary. Bull Mar. Sci. 87, 971–1002 (2011)
Hedrick, P.: Large variance in reproductive success and the \({N}_e/{N}\) ratio. Evolution 59(7), 1596 (2005). https://doi.org/10.1554/05-009
Hénard, O.: The fixation line in the \({\varLambda }\)-coalescent. Ann. Appl. Probab. 25(5), 3007–3032 (2015)
Herriger, P., Möhle, M.: Conditions for exchangeable coalescents to come down from infinity. Alea 9(2), 637–665 (2012)
Hird, S., Kubatko, L., Carstens, B.: Rapid and accurate species tree estimation for phylogeographic investigations using replicated subsampling. Mol. Phylogenetics Evol. 57(2), 888–898 (2010)
Hovmøller, M.S., Sørensen, C.K., Walter, S., Justesen, A.F.: Diversity of Puccinia striiformis on cereals and grasses. Annu. Rev. Phytopathol. 49, 197–217 (2011)
Hudson, R.R.: Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23, 183–201 (1983)
Huillet, T., Möhle, M.: On the extended Moran model and its relation to coalescents with multiple collisions. Theor. Popul. Biol. 87, 5–14 (2013)
Kaj, I., Krone, S.M.: The coalescent process in a population with stochastically varying size. J. Appl. Probab. 40(01), 33–48 (2003)
King, L., Wakeley, J.: Empirical bayes estimation of coalescence times from nucleotide sequence data. Genetics 204(1), 249–257 (2016). https://doi.org/10.1534/genetics.115.185751
Kingman, J.F.C.: The coalescent. Stoch. Proc. Appl. 13, 235–248 (1982)
Kingman, J.F.C.: Exchangeability and the evolution of large populations. In: Koch, G., Spizzichino, F. (eds.) Exchangeability in Probability and Statistics, pp. 97–112. North-Holland, Amsterdam (1982)
Kingman, J.F.C.: On the genealogy of large populations. J. Appl. Probab. 19A, 27–43 (1982)
Li, G., Hedgecock, D.: Genetic heterogeneity, detected by PCR-SSCP, among samples of larval Pacific oysters ( Crassostrea gigas ) supports the hypothesis of large variance in reproductive success. Can. J. Fish. Aquat. Sci. 55(4), 1025–1033 (1998). https://doi.org/10.1139/f97-312
May, A.W.: Fecundity of Atlantic cod. J. Fish. Res. Board Can. 24, 1531–1551 (1967)
Möhle, M.: Robustness results for the coalescent. J. Appl. Probab. 35(02), 438–447 (1998)
Möhle, M.: On sampling distributions for coalescent processes with simultaneous multiple collisions. Bernoulli 12(1), 35–53 (2006)
Möhle, M.: Coalescent processes derived from some compound Poisson population models. Electron. Commun. Probab. 16, 567–582 (2011)
Möhle, M., Sagitov, S.: A classification of coalescent processes for haploid exchangeable population models. Ann. Probab. 29, 1547–1562 (2001)
Möhle, M., Sagitov, S.: Coalescent patterns in diploid exchangeable population models. J. Math. Biol. 47, 337–352 (2003)
Neher, R.A., Hallatschek, O.: Genealogies of rapidly adapting populations. Proc. Natl. Acad. Sci. 110(2), 437–442 (2013)
Niwa, H.S., Nashida, K., Yanagimoto, T.: Reproductive skew in japanese sardine inferred from DNA sequences. ICES J. Mar. Sci. 73(9), 2181–2189 (2016). https://doi.org/10.1093/icesjms/fsw070
Oosthuizen, E., Daan, N.: Egg fecundity and maturity of North Sea cod, Gadus morhua. Neth. J. Sea Res. 8(4), 378–397 (1974)
Pettengill, J.B.: The time to most recent common ancestor does not (usually) approximate the date of divergence. PloS ONE 10(8), e0128,407 (2015)
Pitman, J.: Coalescents with multiple collisions. Ann. Probab. 27, 1870–1902 (1999)
Sagitov, S.: The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Probab. 36, 1116–1125 (1999)
Sagitov, S.: Convergence to the coalescent with simultaneous mergers. J. Appl. Probab. 40, 839–854 (2003)
Sargsyan, O., Wakeley, J.: A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theor. Popul. Biol. 74, 104–114 (2008)
Saunders, I.W., Tavaré, S., Watterson, G.A.: On the genealogy of nested subsamples from a haploid population. Adv. Appl. Probab. 16(3), 471 (1984). https://doi.org/10.2307/1427285
Schweinsberg, J.: Rigorous results for a population model with selection II: genealogy of the population. Electron. J. Probab. https://doi.org/10.1214/17-EJP58 (2017)
Schweinsberg, J.: Coalescents with simultaneous multiple collisions. Electron. J. Probab. 5, 1–50 (2000)
Schweinsberg, J.: A necessary and sufficient condition for the-coalescent to come down from the infinity. Electron. Commun. Probab. 5, 1–11 (2000)
Schweinsberg, J.: Coalescent processes obtained from supercritical Galton-Watson processes. Stoch. Proc. Appl. 106, 107–139 (2003)
Simon, M., Cordo, C.: Inheritance of partial resistance to Septoria tritici in wheat (Triticum aestivum): limitation of pycnidia and spore production. Agronomie 17(6–7), 343–347 (1997)
Slack, R.: A branching process with mean one and possibly infinite variance. Probab. Theory Relat. Fields 9(2), 139–145 (1968)
Spouge, J.L.: Within a sample from a population, the distribution of the number of descendants of a subsample’s most recent common ancestor. Theor. Popul. Biol. 92, 51–54 (2014)
Tajima, F.: Evolutionary relationships of DNA sequences in finite populations. Genetics 105, 437–460 (1983)
Timm, A., Yin, J.: Kinetics of virus production from single cells. Virology 424(1), 11–17 (2012)
Wakeley, J.: Coalescent Theory. Roberts & Co, Greenwood Village (2007)
Wakeley, J., Takahashi, T.: Gene genealogies when the sample size exceeds the effective size of the population. Mol. Biol. Evol. 20, 208–2013 (2003)
Waples, R.S.: Tiny estimates of the \({N_e}/{N}\) ratio in marine fishes: are they real? J. Fish Biol. 89(6), 2479–2504 (2016). https://doi.org/10.1111/jfb.13143
Wiuf, C., Donnelly, P.: Conditional genealogies and the age of a neutral mutant. Theor. Popul. Biol. 56(2), 183–201 (1999). https://doi.org/10.1006/tpbi.1998.1411
Acknowledgements
We thank Alison Etheridge for many and very valuable comments and suggestions, especially regarding Theorem 1. BE was funded by DFG grant STE 325/17-1 to Wolfgang Stephan through Priority Programme SPP1819: Rapid Evolutionary Adaptation. FF was funded by DFG grant FR 3633/2-1 through Priority Program 1590: Probabilistic Structures in Evolution.
Author information
Authors and Affiliations
Corresponding author
Appendices
A1 Population Models
In this section we provide a brief overview of the population models behind the coalescent processes we consider, and why we think they are interesting. A detailed description of the coalescent processes is given in Sect. A2.
A universal mechanism among all biological populations is reproduction and inheritance. Reproduction refers to the generation of offspring, and inheritance refers to the transmission of information necessary for viability and reproduction. Mendel’s laws on independent segregation of chromosomes into gametes describe the transmission of information from a parent to an offspring in a diploid population. For our purposes, however, it suffices to think of haploid populations where one can think of an individual as a single gene copy. By tracing gene copies as they are passed on from one generation to the next one automatically stores two sets of information. On the one hand one stores how frequencies of genetic types change going forwards in time; on the other hand one keeps track of the ancestral, or genealogical, relations among the different copies. This duality has been successfully exploited for example in modeling selection [34, 35]. To model genetic variation in natural populations one requires a mathematically tractable model of how genetic information is passed from parents to offspring. In the Wright–Fisher model offspring choose their parents independently and uniformly at random. Suppose we are tracing the ancestry of \(n \ge 2\) gene copies in a haploid Wright–Fisher population of N gene copies in total. For any pair, the chance that they have a common ancestor in the previous generation is 1 / N. Informally, we trace the genealogy of our gene copies on the order of \({\mathscr {O}}(N)\) generations until we see the first merger, i.e. when at least 2 gene copies (or their ancestral lines) find a common ancestor. If n is small relative to N, when a merger occurs, with probability \(1-{\mathscr {O}}( 1/N)\) it involves just two ancestral lineages. This means that if we measure time in units of N generations, and assume N is very large, the random ancestral relations of our sampled gene copies can be described by a continuous-time Markov chain in which each pair of ancestral lines merges at rate 1 and no other mergers are possible. We have, in an informal way, arrived at the Kingman-coalescent [56,57,58]. One can derive the Kingman-coalescent not just from the Wright–Fisher model but from any population model which satisfies certain assumptions on the offspring distribution [61, 64, 71]. These assumptions mainly dictate that higher moments of the offspring number distribution are small relative to (an appropriate power of) the population size. The Kingman-coalescent, and its various extensions, are used almost universally as the ‘null model’ for a gene genealogy in population genetics. The Kingman-coalescent is a remarkably good model for populations characterised by low fecundity, i.e. whose individuals have small numbers of offspring relative to the population size.
The classical Kingman-coalescent is derived from a population model in which the population size is constant between generations. Extensions to stochastically varying population size, in which the population size does not vary ‘too much’ between generations, have been made [54]; the result is a time-changed Kingman-coalescent. Probably the most commonly applied model of deterministically changing population size is the model of exponential population growth (see eg. [25, 30, 41]). In each generation the population size is multiplied by a factor \((1+\beta /N)\), where \(\beta > 0\). Therefore, the population size in generation k going forward in time is given by \(N_k = N(1 + \beta / N)^k\) where N is taken as the ‘initial’ population size. It follows that the population size \(\lfloor Nt\rfloor \) generations ago is \(Ne^{-\beta t}\). [30] show that exponential population growth can be distinguished from multiple-merger coalescents (in which at least three ancestral lineages can merge simultaneously), derived from population models of high fecundity and sweepstakes reproduction, using population genetic data from a single locus, provided that sample size and number of mutations (segregating sites) are not too small.
A diverse group of natural populations, including some marine organisms [46], fungi [1, 51, 79], and viruses [83] are highly fecund. By way of example, individual Atlantic codfish [60, 68] and Pacific oysters [59] can lay millions of eggs. This high fecundity counteracts the high mortality rate among the larvae (juveniles) of these populations (Type III survivorship). The term ‘sweepstakes reproduction’ has been proposed to describe the reproduction mode of highly fecund populations with Type III survivorship [45]. Population models which admit high fecundity and sweepstakes reproduction (HFSR) through skewed or heavy-tailed offspring number distributions have been developed [31, 53, 64, 65, 73, 78]. In the haploid model of [78], each individual independently contributes a random number X of juveniles where \((C, \alpha > 0)\)
and \(x_n \sim y_n\) means \(x_n/y_n \rightarrow 1\) as \(n\rightarrow \infty \). The constant \(C > 0\) is a normalising constant, and the constant \(\alpha \) determines the skewness of the distribution. The next generation of individuals is then formed by sampling (uniformly without replacement) from the pool of juveniles. In the case \(\alpha < 2\) the random ancestral relations of gene copies can be described by specific forms of multiple-merger coalescent processes [72]. We remark that the fate of the juveniles need not be correlated to generate multiple-mergers in the genealogies — the heavy-tailed distribution of juveniles means that occasionally one ‘lucky’ individual contributes a huge number of juveniles while all others contribute only a small number of juveniles. Uniform sampling without replacement from the pool of juveniles means that the lucky individual leaves significantly more descendents in the next generation than anyone else, and this is what generates multiple mergers of ancestral lines.
Coalescent processes derived from population models of HFSR (see (A28) for an example) admit multiple mergers of ancestral lineages [24, 63, 65, 70,71,72, 76]. Mathematically, we consider exchangeable n-coalescent processes, which are Markovian processes \((\varPi _t^{(n)})_{t\ge 0}\) on the set of partitions of \([n] := \{1,2,\ldots , n\}\) whose transitions are mergers of partition blocks (a ‘block’ is a subset of [n], see Sect. A2) with rates specified in Sect. A2. The blocks of \(\varPi _t^{(n)}\) show which individuals in [n] share a common ancestor at time t measured from the time of sampling. Thus, the blocks of \(\varPi _t^{(n)}\) can be interpreted as ancestral lineages. The specific structure of the transition rates allows to treat a multiple-merger n-coalescent as the restriction of an exchangeable Markovian process \((\varPi _t)_{t\ge 0}\) on the set of partitions of \({\mathbb {N}}\), which is called a multiple-merger coalescent (abbreviated MMC) process. MMC processes are referred to as \(\varLambda \)-coalescents (\(\varLambda \) a finite measure on [0, 1]) [24, 70, 71] if any number of ancestral lineages can merge at any given time, but only one such merger occurs at a time. By way of an example, if \(1 \le \alpha < 2\) in (A28) one obtains a so-called Beta\((2-\alpha ,\alpha )\)-coalescent [72] (Beta-coalescent, see Eq. (A35)). Processes which admit at least two (multiple) mergers at a time are referred to as \(\varXi \)-coalescents (\(\varXi \) a finite measure on the infinite simplex \(\varDelta \)) [64, 65, 76]. See Sect. A2 for details. Specific examples of these MMC processes have been shown to give a better fit to genetic data sampled from Atlantic cod [2, 12, 16, 18, 19] and Japanese sardines [67] than the classical Kingman-coalescent. See e.g. [29] for an overview of inference methods for MMC processes. [46] review the evidence for sweepstakes reproduction among marine populations and conclude ‘that it plays a major role in shaping marine biodiversity’.
MMC models also arise in contexts other than high fecundity. [17] show that repeated strong bottlenecks in a Wright–Fisher population lead to time-changed Kingman-coalescents which look like \(\varXi \)-coalescents. [27, 28] show that the genealogy of a locus subjected to repeated beneficial mutations is well approximated by a \(\varXi \)-coalescent. [75] provides rigorous justification of the claims of [22, 66] that the genealogy of a population subject to repeated beneficial mutations can be described by the Beta-coalescent with \(\alpha = 1\) (also referred to as the Bolthausen–Sznitman coalescent [20]). These examples show that MMC processes are relevant for biology. We refer the interested reader to e.g. [5, 9, 10, 13, 25, 33] for a more detailed background on coalescent theory.
A2 Coalescent Processes
To keep our presentation self-contained a precise definition of the coalescent processes we will need will now be given. We follow the description of [19]. A coalescent process \(\varPi \) is a continuous-time Markov chain on the partitions of \({\mathbb {N}}\). Let \(\varPi ^{(n)}\) denote the restriction to [n], and write \({\mathscr {P}}_n\) for the space of partitions of [n]. A partition \(\pi = \{\pi _1, \ldots , \pi _{\#\pi } \} \in {\mathscr {P}}_n\) has \(\#\pi \) blocks which are disjoint subsets of [n]. We assume the blocks \(\pi _i\) are ordered by their smallest element; therefore we always have \(1 \in \pi _1\). In general a merging event can involve r distinct groups of blocks merging simultaneously. We write \(\underline{k} = (k_1, \ldots , k_r)\) where \(k_i \ge 2\) denotes the number of blocks merging in group i. Here \(r \in [\lfloor \#\pi / 2 \rfloor ]\), \(k_1 + \cdots + k_r \in [\#\pi ]_2\) and \(i_1^{(a)},\ldots , i_{k_a}^{(a)}\) will denote the indices of the blocks in the \(a\hbox {th}\) group. By \(\pi ^\prime \prec _{ \#\pi , \underline{k}} \pi \) we denote a transition from \(\pi \) to \(\pi ^\prime = A\cup B\) where
In (A29), set A (possibly empty) contains the blocks not involved in a merger, and B lists the blocks involved in each of the r mergers. By \(\pi ^\prime \prec _{ \#\pi , k} \pi \) we denote the transition in a \(\varLambda \)-coalescent where \(k \in [\#\pi ]_2\) merge in a single merger and \(\pi ^\prime \) is given as in (A29) with \(r = 1\); ie. only one group of blocks merges in each transition. By \(\pi ^\prime \prec _{\#\pi } \pi \) we denote a transition in the Kingman-coalescent where \(r = 1\) and 2 blocks merge in each transition.
Now that we have specified the possible transitions, we can state the rates of the transitions. Let \(\varDelta \) denote the infinite simplex \(\varDelta = \{(x_1, x_2, \ldots ) : x_1 \ge x_2 \ge \ldots \ge 0, \sum _i x_i \le 1\}\); let \(\varvec{x}\) denote an element of \(\varDelta \). Define the functions \(f(\varvec{x};\#\pi ,\underline{k})\) and \(g(\varvec{x};\#\pi ,\underline{k})\) on \(\varDelta _{\varvec{0}} := \varDelta {\setminus } \{(0,0,\ldots )\}\) where \(\left( \prod _{m=1}^0 x_{i_{r+m}} := 1 \right) \), and \(s = \#\pi - k_1 - \ldots - k_r\), by
where \(x_{i_0} := 1\). For a finite measure \(\varXi \) on \(\varDelta \), set \(\varXi _0:=\varXi (\cdot \cap \varDelta _0)\) and \(a:=\varXi (\{(0,0,\ldots )\})\). Then, define
A \(\varXi \)-coalescent [76] is a continuous-time \({\mathscr {P}}_n\)-valued Markov chain with transitions \(q_{\pi , \pi ^\prime }\) given by, where \( \lambda _{n,\underline{k} }\) and \(\lambda _n \) are given in (A31),
A \(\varLambda \)-coalescent [24, 70, 71] is a specific case of a \(\varXi \)-coalescent where \(\varXi _{\varvec{0}}\) only has support on \(\varDelta _0 := \varDelta _{\varvec{0}} \cap \{(x_1, x_2, \ldots ) : x_1 \in (0,1],\, x_{1+i} = 0\,\, \forall \,\, i \in {\mathbb {N}}\}\) [76]. Let \(\varLambda \) denote the restriction of \(\varXi \) on its first coordinate (which makes \(\varLambda \) a finite measure on [0, 1]). The transition rate of \(\pi ^\prime \prec _{ \#\pi , k} \pi \) becomes, where \(\#\pi = n\), \(2 \le k \le n\),
The total rate of k-mergers in a \(\varLambda \)-coalescent is given by \( \lambda _k(n) = \genfrac(){0.0pt}1{n}{k} \lambda _{n,k}\) for \(2 \le k \le n\). The total rate of mergers given \(n \ge 2\) active blocks is
An important example of a \(\varLambda \)-coalescent is the Beta\((2 - \alpha , \alpha )\)-coalescent [78] where the \(\varLambda \) measure is associated with the beta density, where \(B(\cdot ,\cdot )\) is the beta function,
The total rate of a k-merger \( \lambda _k(n) = \genfrac(){0.0pt}1{n}{k} \lambda _{n,k} \) (see Eq. (A33)) is then given by, for \(2 \le k \le n\),
For \(\alpha = 1\) the Beta\((2 - \alpha ,\alpha )\)-coalescent is the Bolthausen–Sznitman coalescent [20, 39]. The Beta-coalescent is well-studied, there are connections to superprocesses, continuous-state branching processes (CSBP) and continuous stable random trees as described e.g. in [7, 14].
A3 Goldschmidt and Martin’s Construction of the Bolthausen–Sznitman n-coalescent
From [39], we recall the construction of the Bolthausen–Sznitman n-coalescent by cutting the edges of a random recursive tree. Let \({\mathbb {T}}_n\) be a random recursive tree with n nodes. We can construct \({\mathbb {T}}_n\) sequentially as follows
-
(i)
Start with a node labelled with 1 (the root) and no edges,
-
(ii)
If \(i<n\) nodes are present, add a node labelled with \(i+1\) and one edge connecting it to a node in [i] picked uniformly,
-
(iii)
stop if n nodes are present.
The object \({\mathbb {T}}_n\) is a labelled tree, each node has a single label. We consider a realisation of \({\mathbb {T}}_n\) and transform this tree over time into labelled trees with fewer nodes with nodes amassing multiple labels.
-
(i)
Each edge of \({\mathbb {T}}_n\) is linked to an exponential clock. The clocks are i.i.d. Exp(1)-distributed.
-
(ii)
We wait for the first clock to ring. At this time, we cut/remove the edge whose clock rang first. The tree is thus split in two trees, one of these trees includes the node with label 1. We denote this tree by \({\mathbb {T}}^{(1)}\), the other tree by \({\mathbb {T}}^{(2)}\). Let \(e_1\) be the node of \({\mathbb {T}}^{(1)}\) that was connected to the removed edge.
-
(iii)
All labels of \({\mathbb {T}}^{(2)}\) are added to the set of labels of \(e_1\). Remove \({\mathbb {T}}^{(2)}\) including its clocks.
-
(iv)
Repeat from (ii), using \({\mathbb {T}}^{(1)}\) labelled as in (iii) with the (remaining) clocks from (i). Stop when \({\mathbb {T}}^{(1)}\) in step (iii) consists of only a single node and no edges.
-
(v)
For any time t, label sets at the nodes of \({\mathbb {T}}^{(1)}\) (\({\mathbb {T}}_n\) before the first clock has rang) give a partition \(\varPi ^{(n)}_t\) of [n]. The process \((\varPi ^{(n)}_t)_{t\ge 0}\) is a Bolthausen–Sznitman n-coalescent (set \(\varPi ^{(n)}_t=[n]\) if t is bigger than the time at which we stopped the cutting procedure).
Figure 7 shows an illustration of steps (i)–(iii) for a realisation of \({\mathbb {T}}_5\).
Rights and permissions
About this article
Cite this article
Eldon, B., Freund, F. Genealogical Properties of Subsamples in Highly Fecund Populations. J Stat Phys 172, 175–207 (2018). https://doi.org/10.1007/s10955-018-2013-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10955-018-2013-1