Genealogical Properties of Subsamples in Highly Fecund Populations

Eldon, Bjarki; Freund, Fabian

doi:10.1007/s10955-018-2013-1

Genealogical Properties of Subsamples in Highly Fecund Populations

Published: 20 March 2018

Volume 172, pages 175–207, (2018)
Cite this article

Journal of Statistical Physics Aims and scope Submit manuscript

216 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

We consider some genealogical properties of nested samples. The complete sample is assumed to have been drawn from a natural population characterised by high fecundity and sweepstakes reproduction (abbreviated HFSR). The random gene genealogies of the samples are—due to our assumption of HFSR—modelled by coalescent processes which admit multiple mergers of ancestral lineages looking back in time. Among the genealogical properties we consider are the probability that the most recent common ancestor is shared between the complete sample and the subsample nested within the complete sample; we also compare the lengths of ‘internal’ branches of nested genealogies between different coalescent processes. The results indicate how ‘informative’ a subsample is about the properties of the larger complete sample, how much information is gained by increasing the sample size, and how the ‘informativeness’ of the subsample varies between different coalescent processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference?

Article 09 December 2015

O Mazet, W Rodríguez, … L Chikhi

Inference Methods for Multiple Merger Coalescents

The IICR (inverse instantaneous coalescence rate) as a summary of genomic diversity: insights into demographic inference and model choice

Article Open access 08 November 2017

Lounès Chikhi, Willy Rodríguez, … Olivier Mazet

References

Agrios, G.: Plant Pathology. Academic Press, Amsterdam (2005)
Google Scholar
Árnason, E., Halldórsdóttir, K.: Nucleotide variation and balancing selection at the Ckma gene in Atlantic cod: analysis with multiple merger coalescent models. PeerJ 3, e786 (2015). https://doi.org/10.7717/peerj.786
Article Google Scholar
Arratia, R., Barbour, A.D., Tavaré, S.: Logarithmic Combinatorial Structures: A Probabilistic Approach. European Mathematical Society (EMS), Zürich (2003)
Book MATH Google Scholar
Barney, B.T., Munkholm, C., Walt, D.R., Palumbi, S.R.: Highly localized divergence within supergenes in atlantic cod (gadus morhua) within the gulf of maine. BMC Genomics 18(1) (2017). https://doi.org/10.1186/s12864-017-3660-3
Barton, N.H., Etheridge, A.M., Véber, A.: Modelling evolution in a spatial continuum. J. Stat. Mech. 2013(01), P01,002 (2013). http://stacks.iop.org/1742-5468/2013/i=01/a=P01002
Basu, A., Majumder, P.P.: A comparison of two popular statistical methods for estimating the time to most recent common ancestor (tmrca) from a sample of DNA sequences. J. Genet. 82(1–2), 7–12 (2003)
Article Google Scholar
Berestycki, J., Berestycki, N., Schweinsberg, J.: Beta-coalescents and continuous stable random trees. Ann. Probab. 35, 1835–1887 (2007)
Article MathSciNet MATH Google Scholar
Berestycki, J., Berestycki, N., Schweinsberg, J.: Small-time behavior of beta coalescents. Ann. Inst. H Poincaré Probab. Stat. 44, 214–238 (2008)
Article ADS MathSciNet MATH Google Scholar
Berestycki, N.: Recent progress in coalescent theory. Ensaios Mathématicos 16, 1–193 (2009)
MathSciNet MATH Google Scholar
Bertoin, J.: Exchangeable coalescents. Cours d’école doctorale, pp. 20–24 (2010)
Bhaskar, A., Clark, A., Song, Y.: Distortion of genealogical properties when the sample size is very large. PNAS 111, 2385–2390 (2014)
Article ADS Google Scholar
Birkner, M., Blath, J.: Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. J. Math. Biol. 57, 435–465 (2008)
Article MathSciNet MATH Google Scholar
Birkner, M., Blath, J.: Coalescents and population genetic inference. Trends Stoch. Anal. 353, 329 (2009)
Article MATH Google Scholar
Birkner, M., Blath, J., Capaldo, M., Etheridge, A.M., Möhle, M., Schweinsberg, J., Wakolbinger, A.: Alpha-stable branching and beta-coalescents. Electron. J. Probab. 10, 303–325 (2005)
Article MathSciNet MATH Google Scholar
Birkner, M., Blath, J., Eldon, B.: An ancestral recombination graph for diploid populations with skewed offspring distribution. Genetics 193, 255–290 (2013)
Article Google Scholar
Birkner, M., Blath, J., Eldon, B.: Statistical properties of the site-frequency spectrum associated with $\varLambda $-coalescents. Genetics 195, 1037–1053 (2013)
Article Google Scholar
Birkner, M., Blath, J., Möhle, M., Steinrücken, M., Tams, J.: A modified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks. ALEA Lat. Am. J. Probab. Math. Stat. 6, 25–61 (2009)
MathSciNet MATH Google Scholar
Birkner, M., Blath, J., Steinrücken, M.: Analysis of DNA sequence variation within marine species using Beta-coalescents. Theor. Popul. Biol. 87, 15–24 (2013)
Article MATH Google Scholar
Blath, J., Cronjäger, M.C., Eldon, B., Hammer, M.: The site-frequency spectrum associated with $\varXi $-coalescents. Theor. Popul. Biol. 110, 36–50 (2016). https://doi.org/10.1016/j.tpb.2016.04.002
Article MATH Google Scholar
Bolthausen, E., Sznitman, A.: On Ruelle’s probability cascades and an abstract cavity method. Commun. Math. Phys. 197, 247–276 (1998)
Article ADS MathSciNet MATH Google Scholar
Capra, J.A., Stolzer, M., Durand, D., Pollard, K.S.: How old is my gene? Trends Genet. 29(11), 659–668 (2013)
Article Google Scholar
Desai, M.M., Walczak, A.M., Fisher, D.S.: Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics 193(2), 565–585 (2013)
Article Google Scholar
Dong, R., Gnedin, A., Pitman, J.: Exchangeable partitions derived from markovian coalescents. Ann. Appl. Probab. 17, 1172–1201 (2007)
Article MathSciNet MATH Google Scholar
Donnelly, P., Kurtz, T.G.: Particle representations for measure-valued population models. Ann. Probab. 27, 166–205 (1999)
Article MathSciNet MATH Google Scholar
Donnelly, P., Tavare, S.: Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29(1), 401–421 (1995)
Article Google Scholar
Durrett, R.: Probability Models for DNA Sequence Evolution, 2nd edn. Springer, New York (2008)
Book MATH Google Scholar
Durrett, R., Schweinsberg, J.: Approximating selective sweeps. Theor. Popul. Biol. 66, 129–138 (2004)
Article MATH Google Scholar
Durrett, R., Schweinsberg, J.: A coalescent model for the effect of advantageous mutations on the genealogy of a population. Stoch. Proc. Appl. 115, 1628–1657 (2005)
Article MathSciNet MATH Google Scholar
Eldon, B.: Inference methods for multiple merger coalescents. In: Pontarotti, P. (ed.) Evolutionary Biology: Convergent Evolution, Evolution of Complex Traits, Concepts and Methods, pp. 347–371. Springer, New York (2016)
Chapter Google Scholar
Eldon, B., Birkner, M., Blath, J., Freund, F.: Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents. Genetics 199, 841–856 (2015)
Article Google Scholar
Eldon, B., Wakeley, J.: Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics 172, 2621–2633 (2006)
Article Google Scholar
Eldon, B., Wakeley, J.: Linkage disequilibrium under skewed offspring distribution among individuals in a population. Genetics 178, 1517–1532 (2008)
Article Google Scholar
Etheridge, A.: Some Mathematical Models from Population Genetics. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-16632-7
Book MATH Google Scholar
Etheridge, A., Griffiths, R.: A coalescent dual process in a Moran model with genic selection. Theor. Popul. Biol. 75, 320–330 (2009)
Article MATH Google Scholar
Etheridge, A.M., Griffiths, R.C., Taylor, J.E.: A coalescent dual process in a Moran model with genic selection, and the Lambda coalescent limit. Theor. Popul. Biol. 78, 77–92 (2010)
Article MATH Google Scholar
Ewens, W.J.: Mathematical Population Genetics 1: Theoretical Introduction, vol. 27. Springer, New York (2012)
MATH Google Scholar
Freund, F., Möhle, M.: On the size of the block of 1 for $\varXi $-coalescents with dust. Modern Stoch. Theory Appl. 4(4), 407–425 (2017). https://doi.org/10.15559/17-VMSTA92
Freund, F., Siri-Jégousse, A.: Minimal clade size in the bolthausen-sznitman coalescent. J. Appl. Probab. 51(3), 657–668 (2014)
Article MathSciNet MATH Google Scholar
Goldschmidt, C., Martin, J.B.: Random recursive trees and the Bolthausen-Sznitman coalescent. Electron. J. Probab. 10(21), 718–745 (2005)
Article MathSciNet MATH Google Scholar
Griffiths, R.C., Tavare, S.: Monte carlo inference methods in population genetics. Math. Comput. Model. 23(8–9), 141–158 (1996)
Article MathSciNet MATH Google Scholar
Griffiths, R.C., Tavaré, S.: The age of a mutation in a general coalescent tree. Commun. Stat. Stoch. Model. 14, 273–295 (1998)
Article MathSciNet MATH Google Scholar
Griswold, C.K., Baker, A.J.: Time to the most recent common ancestor and divergence times of populations of common chaffinches (Fringilla coelebs) in Europe and North Africa: insights into Pleistocene refugia and current levels of migration. Evolution 56(1), 143–153 (2002)
Article Google Scholar
Halldórsdóttir, K., Árnason, E.: Whole-genome sequencing uncovers cryptic and hybrid species among Atlantic and Pacific cod-fish (2015). https://www.biorxiv.org/content/early/2015/12/20/034926
Hintze, J.L., Nelson, R.D.: Violin plots: a box plot-density trace synergism. Am. Stat. 52(2), 181–184 (1998). https://doi.org/10.1080/00031305.1998.10480559
Google Scholar
Hedgecock, D.: Does variance in reproductive success limit effective population sizes of marine organisms? In: Beaumont, A. (ed.) Genetics and Evolution of Aquatic Organisms, pp. 1222–1344. Chapman and Hall, London (1994)
Google Scholar
Hedgecock, D., Pudovkin, A.I.: Sweepstakes reproductive success in highly fecund marine fish and shellfish: a review and commentary. Bull Mar. Sci. 87, 971–1002 (2011)
Article Google Scholar
Hedrick, P.: Large variance in reproductive success and the ${N}_e/{N}$ ratio. Evolution 59(7), 1596 (2005). https://doi.org/10.1554/05-009
Article Google Scholar
Hénard, O.: The fixation line in the ${\varLambda }$-coalescent. Ann. Appl. Probab. 25(5), 3007–3032 (2015)
Article MathSciNet MATH Google Scholar
Herriger, P., Möhle, M.: Conditions for exchangeable coalescents to come down from infinity. Alea 9(2), 637–665 (2012)
MathSciNet MATH Google Scholar
Hird, S., Kubatko, L., Carstens, B.: Rapid and accurate species tree estimation for phylogeographic investigations using replicated subsampling. Mol. Phylogenetics Evol. 57(2), 888–898 (2010)
Article Google Scholar
Hovmøller, M.S., Sørensen, C.K., Walter, S., Justesen, A.F.: Diversity of Puccinia striiformis on cereals and grasses. Annu. Rev. Phytopathol. 49, 197–217 (2011)
Article Google Scholar
Hudson, R.R.: Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23, 183–201 (1983)
Article MATH Google Scholar
Huillet, T., Möhle, M.: On the extended Moran model and its relation to coalescents with multiple collisions. Theor. Popul. Biol. 87, 5–14 (2013)
Article MATH Google Scholar
Kaj, I., Krone, S.M.: The coalescent process in a population with stochastically varying size. J. Appl. Probab. 40(01), 33–48 (2003)
Article MathSciNet MATH Google Scholar
King, L., Wakeley, J.: Empirical bayes estimation of coalescence times from nucleotide sequence data. Genetics 204(1), 249–257 (2016). https://doi.org/10.1534/genetics.115.185751
Article Google Scholar
Kingman, J.F.C.: The coalescent. Stoch. Proc. Appl. 13, 235–248 (1982)
Article MathSciNet MATH Google Scholar
Kingman, J.F.C.: Exchangeability and the evolution of large populations. In: Koch, G., Spizzichino, F. (eds.) Exchangeability in Probability and Statistics, pp. 97–112. North-Holland, Amsterdam (1982)
Google Scholar
Kingman, J.F.C.: On the genealogy of large populations. J. Appl. Probab. 19A, 27–43 (1982)
Article MathSciNet MATH Google Scholar
Li, G., Hedgecock, D.: Genetic heterogeneity, detected by PCR-SSCP, among samples of larval Pacific oysters ( Crassostrea gigas ) supports the hypothesis of large variance in reproductive success. Can. J. Fish. Aquat. Sci. 55(4), 1025–1033 (1998). https://doi.org/10.1139/f97-312
Article Google Scholar
May, A.W.: Fecundity of Atlantic cod. J. Fish. Res. Board Can. 24, 1531–1551 (1967)
Article Google Scholar
Möhle, M.: Robustness results for the coalescent. J. Appl. Probab. 35(02), 438–447 (1998)
Article MathSciNet MATH Google Scholar
Möhle, M.: On sampling distributions for coalescent processes with simultaneous multiple collisions. Bernoulli 12(1), 35–53 (2006)
MathSciNet MATH Google Scholar
Möhle, M.: Coalescent processes derived from some compound Poisson population models. Electron. Commun. Probab. 16, 567–582 (2011)
Article MathSciNet MATH Google Scholar
Möhle, M., Sagitov, S.: A classification of coalescent processes for haploid exchangeable population models. Ann. Probab. 29, 1547–1562 (2001)
Article MathSciNet MATH Google Scholar
Möhle, M., Sagitov, S.: Coalescent patterns in diploid exchangeable population models. J. Math. Biol. 47, 337–352 (2003)
Article MathSciNet MATH Google Scholar
Neher, R.A., Hallatschek, O.: Genealogies of rapidly adapting populations. Proc. Natl. Acad. Sci. 110(2), 437–442 (2013)
Article ADS Google Scholar
Niwa, H.S., Nashida, K., Yanagimoto, T.: Reproductive skew in japanese sardine inferred from DNA sequences. ICES J. Mar. Sci. 73(9), 2181–2189 (2016). https://doi.org/10.1093/icesjms/fsw070
Article Google Scholar
Oosthuizen, E., Daan, N.: Egg fecundity and maturity of North Sea cod, Gadus morhua. Neth. J. Sea Res. 8(4), 378–397 (1974)
Article Google Scholar
Pettengill, J.B.: The time to most recent common ancestor does not (usually) approximate the date of divergence. PloS ONE 10(8), e0128,407 (2015)
Article Google Scholar
Pitman, J.: Coalescents with multiple collisions. Ann. Probab. 27, 1870–1902 (1999)
Article MathSciNet MATH Google Scholar
Sagitov, S.: The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Probab. 36, 1116–1125 (1999)
Article MathSciNet MATH Google Scholar
Sagitov, S.: Convergence to the coalescent with simultaneous mergers. J. Appl. Probab. 40, 839–854 (2003)
Article MathSciNet MATH Google Scholar
Sargsyan, O., Wakeley, J.: A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theor. Popul. Biol. 74, 104–114 (2008)
Article MATH Google Scholar
Saunders, I.W., Tavaré, S., Watterson, G.A.: On the genealogy of nested subsamples from a haploid population. Adv. Appl. Probab. 16(3), 471 (1984). https://doi.org/10.2307/1427285
Article MathSciNet MATH Google Scholar
Schweinsberg, J.: Rigorous results for a population model with selection II: genealogy of the population. Electron. J. Probab. https://doi.org/10.1214/17-EJP58 (2017)
Schweinsberg, J.: Coalescents with simultaneous multiple collisions. Electron. J. Probab. 5, 1–50 (2000)
Article MathSciNet MATH Google Scholar
Schweinsberg, J.: A necessary and sufficient condition for the-coalescent to come down from the infinity. Electron. Commun. Probab. 5, 1–11 (2000)
Article MathSciNet MATH Google Scholar
Schweinsberg, J.: Coalescent processes obtained from supercritical Galton-Watson processes. Stoch. Proc. Appl. 106, 107–139 (2003)
Article MathSciNet MATH Google Scholar
Simon, M., Cordo, C.: Inheritance of partial resistance to Septoria tritici in wheat (Triticum aestivum): limitation of pycnidia and spore production. Agronomie 17(6–7), 343–347 (1997)
Article Google Scholar
Slack, R.: A branching process with mean one and possibly infinite variance. Probab. Theory Relat. Fields 9(2), 139–145 (1968)
MathSciNet MATH Google Scholar
Spouge, J.L.: Within a sample from a population, the distribution of the number of descendants of a subsample’s most recent common ancestor. Theor. Popul. Biol. 92, 51–54 (2014)
Article MATH Google Scholar
Tajima, F.: Evolutionary relationships of DNA sequences in finite populations. Genetics 105, 437–460 (1983)
Google Scholar
Timm, A., Yin, J.: Kinetics of virus production from single cells. Virology 424(1), 11–17 (2012)
Article Google Scholar
Wakeley, J.: Coalescent Theory. Roberts & Co, Greenwood Village (2007)
MATH Google Scholar
Wakeley, J., Takahashi, T.: Gene genealogies when the sample size exceeds the effective size of the population. Mol. Biol. Evol. 20, 208–2013 (2003)
Article Google Scholar
Waples, R.S.: Tiny estimates of the ${N_e}/{N}$ ratio in marine fishes: are they real? J. Fish Biol. 89(6), 2479–2504 (2016). https://doi.org/10.1111/jfb.13143
Article Google Scholar
Wiuf, C., Donnelly, P.: Conditional genealogies and the age of a neutral mutant. Theor. Popul. Biol. 56(2), 183–201 (1999). https://doi.org/10.1006/tpbi.1998.1411

Download references

Acknowledgements

We thank Alison Etheridge for many and very valuable comments and suggestions, especially regarding Theorem 1. BE was funded by DFG grant STE 325/17-1 to Wolfgang Stephan through Priority Programme SPP1819: Rapid Evolutionary Adaptation. FF was funded by DFG grant FR 3633/2-1 through Priority Program 1590: Probabilistic Structures in Evolution.

Author information

Authors and Affiliations

Museum für Naturkunde, 43 Invalidenstraße, 10115, Berlin, Germany
Bjarki Eldon
University of Hohenheim, Institute 350b, Fruwirthstraße 21, 70599, Stuttgart, Germany
Fabian Freund

Authors

Bjarki Eldon
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Freund
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bjarki Eldon.

Appendices

A1 Population Models

In this section we provide a brief overview of the population models behind the coalescent processes we consider, and why we think they are interesting. A detailed description of the coalescent processes is given in Sect. A2.

A universal mechanism among all biological populations is reproduction and inheritance. Reproduction refers to the generation of offspring, and inheritance refers to the transmission of information necessary for viability and reproduction. Mendel’s laws on independent segregation of chromosomes into gametes describe the transmission of information from a parent to an offspring in a diploid population. For our purposes, however, it suffices to think of haploid populations where one can think of an individual as a single gene copy. By tracing gene copies as they are passed on from one generation to the next one automatically stores two sets of information. On the one hand one stores how frequencies of genetic types change going forwards in time; on the other hand one keeps track of the ancestral, or genealogical, relations among the different copies. This duality has been successfully exploited for example in modeling selection [34, 35]. To model genetic variation in natural populations one requires a mathematically tractable model of how genetic information is passed from parents to offspring. In the Wright–Fisher model offspring choose their parents independently and uniformly at random. Suppose we are tracing the ancestry of $n \ge 2$ gene copies in a haploid Wright–Fisher population of N gene copies in total. For any pair, the chance that they have a common ancestor in the previous generation is 1 / N. Informally, we trace the genealogy of our gene copies on the order of ${\mathscr {O}}(N)$ generations until we see the first merger, i.e. when at least 2 gene copies (or their ancestral lines) find a common ancestor. If n is small relative to N, when a merger occurs, with probability $1-{\mathscr {O}}( 1/N)$ it involves just two ancestral lineages. This means that if we measure time in units of N generations, and assume N is very large, the random ancestral relations of our sampled gene copies can be described by a continuous-time Markov chain in which each pair of ancestral lines merges at rate 1 and no other mergers are possible. We have, in an informal way, arrived at the Kingman-coalescent [56,57,58]. One can derive the Kingman-coalescent not just from the Wright–Fisher model but from any population model which satisfies certain assumptions on the offspring distribution [61, 64, 71]. These assumptions mainly dictate that higher moments of the offspring number distribution are small relative to (an appropriate power of) the population size. The Kingman-coalescent, and its various extensions, are used almost universally as the ‘null model’ for a gene genealogy in population genetics. The Kingman-coalescent is a remarkably good model for populations characterised by low fecundity, i.e. whose individuals have small numbers of offspring relative to the population size.

The classical Kingman-coalescent is derived from a population model in which the population size is constant between generations. Extensions to stochastically varying population size, in which the population size does not vary ‘too much’ between generations, have been made [54]; the result is a time-changed Kingman-coalescent. Probably the most commonly applied model of deterministically changing population size is the model of exponential population growth (see eg. [25, 30, 41]). In each generation the population size is multiplied by a factor $(1+\beta /N)$, where $\beta > 0$. Therefore, the population size in generation k going forward in time is given by $N_k = N(1 + \beta / N)^k$ where N is taken as the ‘initial’ population size. It follows that the population size $\lfloor Nt\rfloor $ generations ago is $Ne^{-\beta t}$. [30] show that exponential population growth can be distinguished from multiple-merger coalescents (in which at least three ancestral lineages can merge simultaneously), derived from population models of high fecundity and sweepstakes reproduction, using population genetic data from a single locus, provided that sample size and number of mutations (segregating sites) are not too small.

A diverse group of natural populations, including some marine organisms [46], fungi [1, 51, 79], and viruses [83] are highly fecund. By way of example, individual Atlantic codfish [60, 68] and Pacific oysters [59] can lay millions of eggs. This high fecundity counteracts the high mortality rate among the larvae (juveniles) of these populations (Type III survivorship). The term ‘sweepstakes reproduction’ has been proposed to describe the reproduction mode of highly fecund populations with Type III survivorship [45]. Population models which admit high fecundity and sweepstakes reproduction (HFSR) through skewed or heavy-tailed offspring number distributions have been developed [31, 53, 64, 65, 73, 78]. In the haploid model of [78], each individual independently contributes a random number X of juveniles where $(C, \alpha > 0)$

$$\begin{aligned} {\mathbb {P}}\left( X \ge k \right) \sim \frac{C}{k^\alpha }, \quad k \rightarrow \infty , \end{aligned}$$

(A28)

and $x_n \sim y_n$ means $x_n/y_n \rightarrow 1$ as $n\rightarrow \infty $. The constant $C > 0$ is a normalising constant, and the constant $\alpha $ determines the skewness of the distribution. The next generation of individuals is then formed by sampling (uniformly without replacement) from the pool of juveniles. In the case $\alpha < 2$ the random ancestral relations of gene copies can be described by specific forms of multiple-merger coalescent processes [72]. We remark that the fate of the juveniles need not be correlated to generate multiple-mergers in the genealogies — the heavy-tailed distribution of juveniles means that occasionally one ‘lucky’ individual contributes a huge number of juveniles while all others contribute only a small number of juveniles. Uniform sampling without replacement from the pool of juveniles means that the lucky individual leaves significantly more descendents in the next generation than anyone else, and this is what generates multiple mergers of ancestral lines.

Coalescent processes derived from population models of HFSR (see (A28) for an example) admit multiple mergers of ancestral lineages [24, 63, 65, 70,71,72, 76]. Mathematically, we consider exchangeable n-coalescent processes, which are Markovian processes $(\varPi _t^{(n)})_{t\ge 0}$ on the set of partitions of $[n] := \{1,2,\ldots , n\}$ whose transitions are mergers of partition blocks (a ‘block’ is a subset of [n], see Sect. A2) with rates specified in Sect. A2. The blocks of $\varPi _t^{(n)}$ show which individuals in [n] share a common ancestor at time t measured from the time of sampling. Thus, the blocks of $\varPi _t^{(n)}$ can be interpreted as ancestral lineages. The specific structure of the transition rates allows to treat a multiple-merger n-coalescent as the restriction of an exchangeable Markovian process $(\varPi _t)_{t\ge 0}$ on the set of partitions of ${\mathbb {N}}$, which is called a multiple-merger coalescent (abbreviated MMC) process. MMC processes are referred to as $\varLambda $-coalescents ($\varLambda $ a finite measure on [0, 1]) [24, 70, 71] if any number of ancestral lineages can merge at any given time, but only one such merger occurs at a time. By way of an example, if $1 \le \alpha < 2$ in (A28) one obtains a so-called Beta$(2-\alpha ,\alpha )$-coalescent [72] (Beta-coalescent, see Eq. (A35)). Processes which admit at least two (multiple) mergers at a time are referred to as $\varXi $-coalescents ($\varXi $ a finite measure on the infinite simplex $\varDelta $) [64, 65, 76]. See Sect. A2 for details. Specific examples of these MMC processes have been shown to give a better fit to genetic data sampled from Atlantic cod [2, 12, 16, 18, 19] and Japanese sardines [67] than the classical Kingman-coalescent. See e.g. [29] for an overview of inference methods for MMC processes. [46] review the evidence for sweepstakes reproduction among marine populations and conclude ‘that it plays a major role in shaping marine biodiversity’.

MMC models also arise in contexts other than high fecundity. [17] show that repeated strong bottlenecks in a Wright–Fisher population lead to time-changed Kingman-coalescents which look like $\varXi $-coalescents. [27, 28] show that the genealogy of a locus subjected to repeated beneficial mutations is well approximated by a $\varXi $-coalescent. [75] provides rigorous justification of the claims of [22, 66] that the genealogy of a population subject to repeated beneficial mutations can be described by the Beta-coalescent with $\alpha = 1$ (also referred to as the Bolthausen–Sznitman coalescent [20]). These examples show that MMC processes are relevant for biology. We refer the interested reader to e.g. [5, 9, 10, 13, 25, 33] for a more detailed background on coalescent theory.

A2 Coalescent Processes

To keep our presentation self-contained a precise definition of the coalescent processes we will need will now be given. We follow the description of [19]. A coalescent process $\varPi $ is a continuous-time Markov chain on the partitions of ${\mathbb {N}}$. Let $\varPi ^{(n)}$ denote the restriction to [n], and write ${\mathscr {P}}_n$ for the space of partitions of [n]. A partition $\pi = \{\pi _1, \ldots , \pi _{\#\pi } \} \in {\mathscr {P}}_n$ has $\#\pi $ blocks which are disjoint subsets of [n]. We assume the blocks $\pi _i$ are ordered by their smallest element; therefore we always have $1 \in \pi _1$. In general a merging event can involve r distinct groups of blocks merging simultaneously. We write $\underline{k} = (k_1, \ldots , k_r)$ where $k_i \ge 2$ denotes the number of blocks merging in group i. Here $r \in [\lfloor \#\pi / 2 \rfloor ]$, $k_1 + \cdots + k_r \in [\#\pi ]_2$ and $i_1^{(a)},\ldots , i_{k_a}^{(a)}$ will denote the indices of the blocks in the $a\hbox {th}$ group. By $\pi ^\prime \prec _{ \#\pi , \underline{k}} \pi $ we denote a transition from $\pi $ to $\pi ^\prime = A\cup B$ where

$$\begin{aligned} \begin{aligned} A =\,&\left\{ \pi _\ell : \ell \in [\#\pi ], \ell \notin \bigcup _{a=1}^r \left\{ i_{1}^{(a)}, \ldots , i_{k_a}^{(a)} \right\} \right\} , \\ B =\,&\bigcup _{b=1}^r \left\{ \pi _{i_{1}^{(b)} }, \ldots , \pi _{i_{k_b}^{(b)} } \right\} . \\ \end{aligned} \end{aligned}$$

(A29)

In (A29), set A (possibly empty) contains the blocks not involved in a merger, and B lists the blocks involved in each of the r mergers. By $\pi ^\prime \prec _{ \#\pi , k} \pi $ we denote the transition in a $\varLambda $-coalescent where $k \in [\#\pi ]_2$ merge in a single merger and $\pi ^\prime $ is given as in (A29) with $r = 1$; ie. only one group of blocks merges in each transition. By $\pi ^\prime \prec _{\#\pi } \pi $ we denote a transition in the Kingman-coalescent where $r = 1$ and 2 blocks merge in each transition.

Now that we have specified the possible transitions, we can state the rates of the transitions. Let $\varDelta $ denote the infinite simplex $\varDelta = \{(x_1, x_2, \ldots ) : x_1 \ge x_2 \ge \ldots \ge 0, \sum _i x_i \le 1\}$; let $\varvec{x}$ denote an element of $\varDelta $. Define the functions $f(\varvec{x};\#\pi ,\underline{k})$ and $g(\varvec{x};\#\pi ,\underline{k})$ on $\varDelta _{\varvec{0}} := \varDelta {\setminus } \{(0,0,\ldots )\}$ where $\left( \prod _{m=1}^0 x_{i_{r+m}} := 1 \right) $, and $s = \#\pi - k_1 - \ldots - k_r$, by

$$\begin{aligned} \begin{aligned} f(\varvec{x};\#\pi ,\underline{k}) =\,&\frac{1}{\sum _j x_j^2 } \sum _{\ell = 0}^s \sum _{i_1 \ne \ldots \ne i_{r + \ell } } \left( {\begin{array}{c}s \\ \ell \end{array}}\right) x_{i_1}^{k_1}\cdots x_{i_r}^{k_r} \prod _{m=1}^\ell x_{i_{r+m}} \left( 1 - \sum _j x_j \right) ^{s - \ell }, \\ g(\varvec{x};n ) =\,&\frac{1 - \sum \limits _{\ell = 0 }^{n } \sum \limits _{i_1 \ne \ldots \ne i_{\ell } } \left( {\begin{array}{c} n \\ \ell \end{array}}\right) x_{i_1}\cdots x_{i_\ell } \left( 1 - \sum _j x_j \right) ^{n - \ell } }{\sum _j x_j^2 }. \\ \end{aligned} \end{aligned}$$

(A30)

where $x_{i_0} := 1$. For a finite measure $\varXi $ on $\varDelta $, set $\varXi _0:=\varXi (\cdot \cap \varDelta _0)$ and $a:=\varXi (\{(0,0,\ldots )\})$. Then, define

$$\begin{aligned} \begin{aligned} \lambda _{n,\underline{k} } :=\,&\int _{ \varDelta _{\varvec{0}}} f(\varvec{x},n,\underline{k} ) \varXi _{\varvec{0}} d\varvec{x} + a{\mathbb {1}}_{r=1,k_1=2}, \\ \lambda _{n} :=\,&\int _{ \varDelta _{\varvec{0}}} g(\varvec{x},n) \varXi _{\varvec{0}} d\varvec{x} + a\left( {\begin{array}{c}n\\ 2\end{array}}\right) . \end{aligned} \end{aligned}$$

(A31)

A $\varXi $-coalescent [76] is a continuous-time ${\mathscr {P}}_n$-valued Markov chain with transitions $q_{\pi , \pi ^\prime }$ given by, where $ \lambda _{n,\underline{k} }$ and $\lambda _n $ are given in (A31),

$$\begin{aligned} q_{\pi , \pi ^\prime } = {\left\{ \begin{array}{ll} \lambda _{n,\underline{k} } &{} \text {if }\pi ^\prime \prec _{ \#\pi , \underline{k}} \pi , \#\pi = n, \\ -\lambda _n &{} \text {if }\pi ^\prime = \pi \text { and }n = \#\pi , \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

(A32)

A $\varLambda $-coalescent [24, 70, 71] is a specific case of a $\varXi $-coalescent where $\varXi _{\varvec{0}}$ only has support on $\varDelta _0 := \varDelta _{\varvec{0}} \cap \{(x_1, x_2, \ldots ) : x_1 \in (0,1],\, x_{1+i} = 0\,\, \forall \,\, i \in {\mathbb {N}}\}$ [76]. Let $\varLambda $ denote the restriction of $\varXi $ on its first coordinate (which makes $\varLambda $ a finite measure on [0, 1]). The transition rate of $\pi ^\prime \prec _{ \#\pi , k} \pi $ becomes, where $\#\pi = n$, $2 \le k \le n$,

$$\begin{aligned} \lambda _{n,k} = \int _0^1 x^{k-2}(1-x)^{n-k}\varLambda (dx), \quad 2 \le k \le n. \end{aligned}$$

(A33)

The total rate of k-mergers in a $\varLambda $-coalescent is given by $ \lambda _k(n) = \genfrac(){0.0pt}1{n}{k} \lambda _{n,k}$ for $2 \le k \le n$. The total rate of mergers given $n \ge 2$ active blocks is

$$\begin{aligned} \lambda (n) = \lambda _2(n) + \cdots + \lambda _n(n). \end{aligned}$$

(A34)

An important example of a $\varLambda $-coalescent is the Beta$(2 - \alpha , \alpha )$-coalescent [78] where the $\varLambda $ measure is associated with the beta density, where $B(\cdot ,\cdot )$ is the beta function,

$$\begin{aligned} \varLambda (dx) = \frac{ x^{1-\alpha }(1-x)^{\alpha - 1} }{B(2-\alpha ,\alpha )}dx, \quad 1 \le \alpha < 2. \end{aligned}$$

(A35)

The total rate of a k-merger $ \lambda _k(n) = \genfrac(){0.0pt}1{n}{k} \lambda _{n,k} $ (see Eq. (A33)) is then given by, for $2 \le k \le n$,

$$\begin{aligned} \lambda _k(n) = \left( {\begin{array}{c}n\\ k\end{array}}\right) \frac{B(k-\alpha , n-k+\alpha )}{B(2-\alpha ,\alpha )}, \quad 1 \le \alpha < 2. \end{aligned}$$

(A36)

For $\alpha = 1$ the Beta$(2 - \alpha ,\alpha )$-coalescent is the Bolthausen–Sznitman coalescent [20, 39]. The Beta-coalescent is well-studied, there are connections to superprocesses, continuous-state branching processes (CSBP) and continuous stable random trees as described e.g. in [7, 14].

A3 Goldschmidt and Martin’s Construction of the Bolthausen–Sznitman n-coalescent

From [39], we recall the construction of the Bolthausen–Sznitman n-coalescent by cutting the edges of a random recursive tree. Let ${\mathbb {T}}_n$ be a random recursive tree with n nodes. We can construct ${\mathbb {T}}_n$ sequentially as follows

(i)
Start with a node labelled with 1 (the root) and no edges,
(ii)
If $i<n$ nodes are present, add a node labelled with $i+1$ and one edge connecting it to a node in [i] picked uniformly,
(iii)
stop if n nodes are present.

The object ${\mathbb {T}}_n$ is a labelled tree, each node has a single label. We consider a realisation of ${\mathbb {T}}_n$ and transform this tree over time into labelled trees with fewer nodes with nodes amassing multiple labels.

(i)
Each edge of ${\mathbb {T}}_n$ is linked to an exponential clock. The clocks are i.i.d. Exp(1)-distributed.
(ii)
We wait for the first clock to ring. At this time, we cut/remove the edge whose clock rang first. The tree is thus split in two trees, one of these trees includes the node with label 1. We denote this tree by ${\mathbb {T}}^{(1)}$, the other tree by ${\mathbb {T}}^{(2)}$. Let $e_1$ be the node of ${\mathbb {T}}^{(1)}$ that was connected to the removed edge.
(iii)
All labels of ${\mathbb {T}}^{(2)}$ are added to the set of labels of $e_1$. Remove ${\mathbb {T}}^{(2)}$ including its clocks.
(iv)
Repeat from (ii), using ${\mathbb {T}}^{(1)}$ labelled as in (iii) with the (remaining) clocks from (i). Stop when ${\mathbb {T}}^{(1)}$ in step (iii) consists of only a single node and no edges.
(v)
For any time t, label sets at the nodes of ${\mathbb {T}}^{(1)}$ (${\mathbb {T}}_n$ before the first clock has rang) give a partition $\varPi ^{(n)}_t$ of [n]. The process $(\varPi ^{(n)}_t)_{t\ge 0}$ is a Bolthausen–Sznitman n-coalescent (set $\varPi ^{(n)}_t=[n]$ if t is bigger than the time at which we stopped the cutting procedure).

Figure 7 shows an illustration of steps (i)–(iii) for a realisation of ${\mathbb {T}}_5$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eldon, B., Freund, F. Genealogical Properties of Subsamples in Highly Fecund Populations. J Stat Phys 172, 175–207 (2018). https://doi.org/10.1007/s10955-018-2013-1

Download citation

Received: 16 July 2017
Accepted: 09 March 2018
Published: 20 March 2018
Issue Date: July 2018
DOI: https://doi.org/10.1007/s10955-018-2013-1

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Genealogical Properties of Subsamples in Highly Fecund Populations

Abstract

Access this article

Similar content being viewed by others

On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference?

Inference Methods for Multiple Merger Coalescents

The IICR (inverse instantaneous coalescence rate) as a summary of genomic diversity: insights into demographic inference and model choice

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

A1 Population Models

A2 Coalescent Processes

A3 Goldschmidt and Martin’s Construction of the Bolthausen–Sznitman n-coalescent

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Abstract

Access this article

Similar content being viewed by others

On the importance of being structured: instantaneous coalescence rates and human evolution—lessons for ancestral population size inference?

Inference Methods for Multiple Merger Coalescents

The IICR (inverse instantaneous coalescence rate) as a summary of genomic diversity: insights into demographic inference and model choice

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

A1 Population Models

A2 Coalescent Processes

A3 Goldschmidt and Martin’s Construction of the Bolthausen–Sznitman n-coalescent

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation