Skip to main content
Log in

Genealogical Properties of Subsamples in Highly Fecund Populations

  • Published:
Journal of Statistical Physics Aims and scope Submit manuscript

Abstract

We consider some genealogical properties of nested samples. The complete sample is assumed to have been drawn from a natural population characterised by high fecundity and sweepstakes reproduction (abbreviated HFSR). The random gene genealogies of the samples are—due to our assumption of HFSR—modelled by coalescent processes which admit multiple mergers of ancestral lineages looking back in time. Among the genealogical properties we consider are the probability that the most recent common ancestor is shared between the complete sample and the subsample nested within the complete sample; we also compare the lengths of ‘internal’ branches of nested genealogies between different coalescent processes. The results indicate how ‘informative’ a subsample is about the properties of the larger complete sample, how much information is gained by increasing the sample size, and how the ‘informativeness’ of the subsample varies between different coalescent processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Agrios, G.: Plant Pathology. Academic Press, Amsterdam (2005)

    Google Scholar 

  2. Árnason, E., Halldórsdóttir, K.: Nucleotide variation and balancing selection at the Ckma gene in Atlantic cod: analysis with multiple merger coalescent models. PeerJ 3, e786 (2015). https://doi.org/10.7717/peerj.786

    Article  Google Scholar 

  3. Arratia, R., Barbour, A.D., Tavaré, S.: Logarithmic Combinatorial Structures: A Probabilistic Approach. European Mathematical Society (EMS), Zürich (2003)

    Book  MATH  Google Scholar 

  4. Barney, B.T., Munkholm, C., Walt, D.R., Palumbi, S.R.: Highly localized divergence within supergenes in atlantic cod (gadus morhua) within the gulf of maine. BMC Genomics 18(1) (2017). https://doi.org/10.1186/s12864-017-3660-3

  5. Barton, N.H., Etheridge, A.M., Véber, A.: Modelling evolution in a spatial continuum. J. Stat. Mech. 2013(01), P01,002 (2013). http://stacks.iop.org/1742-5468/2013/i=01/a=P01002

  6. Basu, A., Majumder, P.P.: A comparison of two popular statistical methods for estimating the time to most recent common ancestor (tmrca) from a sample of DNA sequences. J. Genet. 82(1–2), 7–12 (2003)

    Article  Google Scholar 

  7. Berestycki, J., Berestycki, N., Schweinsberg, J.: Beta-coalescents and continuous stable random trees. Ann. Probab. 35, 1835–1887 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  8. Berestycki, J., Berestycki, N., Schweinsberg, J.: Small-time behavior of beta coalescents. Ann. Inst. H Poincaré Probab. Stat. 44, 214–238 (2008)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  9. Berestycki, N.: Recent progress in coalescent theory. Ensaios Mathématicos 16, 1–193 (2009)

    MathSciNet  MATH  Google Scholar 

  10. Bertoin, J.: Exchangeable coalescents. Cours d’école doctorale, pp. 20–24 (2010)

  11. Bhaskar, A., Clark, A., Song, Y.: Distortion of genealogical properties when the sample size is very large. PNAS 111, 2385–2390 (2014)

    Article  ADS  Google Scholar 

  12. Birkner, M., Blath, J.: Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. J. Math. Biol. 57, 435–465 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  13. Birkner, M., Blath, J.: Coalescents and population genetic inference. Trends Stoch. Anal. 353, 329 (2009)

    Article  MATH  Google Scholar 

  14. Birkner, M., Blath, J., Capaldo, M., Etheridge, A.M., Möhle, M., Schweinsberg, J., Wakolbinger, A.: Alpha-stable branching and beta-coalescents. Electron. J. Probab. 10, 303–325 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  15. Birkner, M., Blath, J., Eldon, B.: An ancestral recombination graph for diploid populations with skewed offspring distribution. Genetics 193, 255–290 (2013)

    Article  Google Scholar 

  16. Birkner, M., Blath, J., Eldon, B.: Statistical properties of the site-frequency spectrum associated with \(\varLambda \)-coalescents. Genetics 195, 1037–1053 (2013)

    Article  Google Scholar 

  17. Birkner, M., Blath, J., Möhle, M., Steinrücken, M., Tams, J.: A modified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks. ALEA Lat. Am. J. Probab. Math. Stat. 6, 25–61 (2009)

    MathSciNet  MATH  Google Scholar 

  18. Birkner, M., Blath, J., Steinrücken, M.: Analysis of DNA sequence variation within marine species using Beta-coalescents. Theor. Popul. Biol. 87, 15–24 (2013)

    Article  MATH  Google Scholar 

  19. Blath, J., Cronjäger, M.C., Eldon, B., Hammer, M.: The site-frequency spectrum associated with \(\varXi \)-coalescents. Theor. Popul. Biol. 110, 36–50 (2016). https://doi.org/10.1016/j.tpb.2016.04.002

    Article  MATH  Google Scholar 

  20. Bolthausen, E., Sznitman, A.: On Ruelle’s probability cascades and an abstract cavity method. Commun. Math. Phys. 197, 247–276 (1998)

    Article  ADS  MathSciNet  MATH  Google Scholar 

  21. Capra, J.A., Stolzer, M., Durand, D., Pollard, K.S.: How old is my gene? Trends Genet. 29(11), 659–668 (2013)

    Article  Google Scholar 

  22. Desai, M.M., Walczak, A.M., Fisher, D.S.: Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics 193(2), 565–585 (2013)

    Article  Google Scholar 

  23. Dong, R., Gnedin, A., Pitman, J.: Exchangeable partitions derived from markovian coalescents. Ann. Appl. Probab. 17, 1172–1201 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  24. Donnelly, P., Kurtz, T.G.: Particle representations for measure-valued population models. Ann. Probab. 27, 166–205 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  25. Donnelly, P., Tavare, S.: Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29(1), 401–421 (1995)

    Article  Google Scholar 

  26. Durrett, R.: Probability Models for DNA Sequence Evolution, 2nd edn. Springer, New York (2008)

    Book  MATH  Google Scholar 

  27. Durrett, R., Schweinsberg, J.: Approximating selective sweeps. Theor. Popul. Biol. 66, 129–138 (2004)

    Article  MATH  Google Scholar 

  28. Durrett, R., Schweinsberg, J.: A coalescent model for the effect of advantageous mutations on the genealogy of a population. Stoch. Proc. Appl. 115, 1628–1657 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  29. Eldon, B.: Inference methods for multiple merger coalescents. In: Pontarotti, P. (ed.) Evolutionary Biology: Convergent Evolution, Evolution of Complex Traits, Concepts and Methods, pp. 347–371. Springer, New York (2016)

    Chapter  Google Scholar 

  30. Eldon, B., Birkner, M., Blath, J., Freund, F.: Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents. Genetics 199, 841–856 (2015)

    Article  Google Scholar 

  31. Eldon, B., Wakeley, J.: Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics 172, 2621–2633 (2006)

    Article  Google Scholar 

  32. Eldon, B., Wakeley, J.: Linkage disequilibrium under skewed offspring distribution among individuals in a population. Genetics 178, 1517–1532 (2008)

    Article  Google Scholar 

  33. Etheridge, A.: Some Mathematical Models from Population Genetics. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-16632-7

    Book  MATH  Google Scholar 

  34. Etheridge, A., Griffiths, R.: A coalescent dual process in a Moran model with genic selection. Theor. Popul. Biol. 75, 320–330 (2009)

    Article  MATH  Google Scholar 

  35. Etheridge, A.M., Griffiths, R.C., Taylor, J.E.: A coalescent dual process in a Moran model with genic selection, and the Lambda coalescent limit. Theor. Popul. Biol. 78, 77–92 (2010)

    Article  MATH  Google Scholar 

  36. Ewens, W.J.: Mathematical Population Genetics 1: Theoretical Introduction, vol. 27. Springer, New York (2012)

    MATH  Google Scholar 

  37. Freund, F., Möhle, M.: On the size of the block of 1 for \(\varXi \)-coalescents with dust. Modern Stoch. Theory Appl. 4(4), 407–425 (2017). https://doi.org/10.15559/17-VMSTA92

  38. Freund, F., Siri-Jégousse, A.: Minimal clade size in the bolthausen-sznitman coalescent. J. Appl. Probab. 51(3), 657–668 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  39. Goldschmidt, C., Martin, J.B.: Random recursive trees and the Bolthausen-Sznitman coalescent. Electron. J. Probab. 10(21), 718–745 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  40. Griffiths, R.C., Tavare, S.: Monte carlo inference methods in population genetics. Math. Comput. Model. 23(8–9), 141–158 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  41. Griffiths, R.C., Tavaré, S.: The age of a mutation in a general coalescent tree. Commun. Stat. Stoch. Model. 14, 273–295 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  42. Griswold, C.K., Baker, A.J.: Time to the most recent common ancestor and divergence times of populations of common chaffinches (Fringilla coelebs) in Europe and North Africa: insights into Pleistocene refugia and current levels of migration. Evolution 56(1), 143–153 (2002)

    Article  Google Scholar 

  43. Halldórsdóttir, K., Árnason, E.: Whole-genome sequencing uncovers cryptic and hybrid species among Atlantic and Pacific cod-fish (2015). https://www.biorxiv.org/content/early/2015/12/20/034926

  44. Hintze, J.L., Nelson, R.D.: Violin plots: a box plot-density trace synergism. Am. Stat. 52(2), 181–184 (1998). https://doi.org/10.1080/00031305.1998.10480559

    Google Scholar 

  45. Hedgecock, D.: Does variance in reproductive success limit effective population sizes of marine organisms? In: Beaumont, A. (ed.) Genetics and Evolution of Aquatic Organisms, pp. 1222–1344. Chapman and Hall, London (1994)

    Google Scholar 

  46. Hedgecock, D., Pudovkin, A.I.: Sweepstakes reproductive success in highly fecund marine fish and shellfish: a review and commentary. Bull Mar. Sci. 87, 971–1002 (2011)

    Article  Google Scholar 

  47. Hedrick, P.: Large variance in reproductive success and the \({N}_e/{N}\) ratio. Evolution 59(7), 1596 (2005). https://doi.org/10.1554/05-009

    Article  Google Scholar 

  48. Hénard, O.: The fixation line in the \({\varLambda }\)-coalescent. Ann. Appl. Probab. 25(5), 3007–3032 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  49. Herriger, P., Möhle, M.: Conditions for exchangeable coalescents to come down from infinity. Alea 9(2), 637–665 (2012)

    MathSciNet  MATH  Google Scholar 

  50. Hird, S., Kubatko, L., Carstens, B.: Rapid and accurate species tree estimation for phylogeographic investigations using replicated subsampling. Mol. Phylogenetics Evol. 57(2), 888–898 (2010)

    Article  Google Scholar 

  51. Hovmøller, M.S., Sørensen, C.K., Walter, S., Justesen, A.F.: Diversity of Puccinia striiformis on cereals and grasses. Annu. Rev. Phytopathol. 49, 197–217 (2011)

    Article  Google Scholar 

  52. Hudson, R.R.: Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23, 183–201 (1983)

    Article  MATH  Google Scholar 

  53. Huillet, T., Möhle, M.: On the extended Moran model and its relation to coalescents with multiple collisions. Theor. Popul. Biol. 87, 5–14 (2013)

    Article  MATH  Google Scholar 

  54. Kaj, I., Krone, S.M.: The coalescent process in a population with stochastically varying size. J. Appl. Probab. 40(01), 33–48 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  55. King, L., Wakeley, J.: Empirical bayes estimation of coalescence times from nucleotide sequence data. Genetics 204(1), 249–257 (2016). https://doi.org/10.1534/genetics.115.185751

    Article  Google Scholar 

  56. Kingman, J.F.C.: The coalescent. Stoch. Proc. Appl. 13, 235–248 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  57. Kingman, J.F.C.: Exchangeability and the evolution of large populations. In: Koch, G., Spizzichino, F. (eds.) Exchangeability in Probability and Statistics, pp. 97–112. North-Holland, Amsterdam (1982)

    Google Scholar 

  58. Kingman, J.F.C.: On the genealogy of large populations. J. Appl. Probab. 19A, 27–43 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  59. Li, G., Hedgecock, D.: Genetic heterogeneity, detected by PCR-SSCP, among samples of larval Pacific oysters ( Crassostrea gigas ) supports the hypothesis of large variance in reproductive success. Can. J. Fish. Aquat. Sci. 55(4), 1025–1033 (1998). https://doi.org/10.1139/f97-312

    Article  Google Scholar 

  60. May, A.W.: Fecundity of Atlantic cod. J. Fish. Res. Board Can. 24, 1531–1551 (1967)

    Article  Google Scholar 

  61. Möhle, M.: Robustness results for the coalescent. J. Appl. Probab. 35(02), 438–447 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  62. Möhle, M.: On sampling distributions for coalescent processes with simultaneous multiple collisions. Bernoulli 12(1), 35–53 (2006)

    MathSciNet  MATH  Google Scholar 

  63. Möhle, M.: Coalescent processes derived from some compound Poisson population models. Electron. Commun. Probab. 16, 567–582 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  64. Möhle, M., Sagitov, S.: A classification of coalescent processes for haploid exchangeable population models. Ann. Probab. 29, 1547–1562 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  65. Möhle, M., Sagitov, S.: Coalescent patterns in diploid exchangeable population models. J. Math. Biol. 47, 337–352 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  66. Neher, R.A., Hallatschek, O.: Genealogies of rapidly adapting populations. Proc. Natl. Acad. Sci. 110(2), 437–442 (2013)

    Article  ADS  Google Scholar 

  67. Niwa, H.S., Nashida, K., Yanagimoto, T.: Reproductive skew in japanese sardine inferred from DNA sequences. ICES J. Mar. Sci. 73(9), 2181–2189 (2016). https://doi.org/10.1093/icesjms/fsw070

    Article  Google Scholar 

  68. Oosthuizen, E., Daan, N.: Egg fecundity and maturity of North Sea cod, Gadus morhua. Neth. J. Sea Res. 8(4), 378–397 (1974)

    Article  Google Scholar 

  69. Pettengill, J.B.: The time to most recent common ancestor does not (usually) approximate the date of divergence. PloS ONE 10(8), e0128,407 (2015)

    Article  Google Scholar 

  70. Pitman, J.: Coalescents with multiple collisions. Ann. Probab. 27, 1870–1902 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  71. Sagitov, S.: The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Probab. 36, 1116–1125 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  72. Sagitov, S.: Convergence to the coalescent with simultaneous mergers. J. Appl. Probab. 40, 839–854 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  73. Sargsyan, O., Wakeley, J.: A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theor. Popul. Biol. 74, 104–114 (2008)

    Article  MATH  Google Scholar 

  74. Saunders, I.W., Tavaré, S., Watterson, G.A.: On the genealogy of nested subsamples from a haploid population. Adv. Appl. Probab. 16(3), 471 (1984). https://doi.org/10.2307/1427285

    Article  MathSciNet  MATH  Google Scholar 

  75. Schweinsberg, J.: Rigorous results for a population model with selection II: genealogy of the population. Electron. J. Probab. https://doi.org/10.1214/17-EJP58 (2017)

  76. Schweinsberg, J.: Coalescents with simultaneous multiple collisions. Electron. J. Probab. 5, 1–50 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  77. Schweinsberg, J.: A necessary and sufficient condition for the-coalescent to come down from the infinity. Electron. Commun. Probab. 5, 1–11 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  78. Schweinsberg, J.: Coalescent processes obtained from supercritical Galton-Watson processes. Stoch. Proc. Appl. 106, 107–139 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  79. Simon, M., Cordo, C.: Inheritance of partial resistance to Septoria tritici in wheat (Triticum aestivum): limitation of pycnidia and spore production. Agronomie 17(6–7), 343–347 (1997)

    Article  Google Scholar 

  80. Slack, R.: A branching process with mean one and possibly infinite variance. Probab. Theory Relat. Fields 9(2), 139–145 (1968)

    MathSciNet  MATH  Google Scholar 

  81. Spouge, J.L.: Within a sample from a population, the distribution of the number of descendants of a subsample’s most recent common ancestor. Theor. Popul. Biol. 92, 51–54 (2014)

    Article  MATH  Google Scholar 

  82. Tajima, F.: Evolutionary relationships of DNA sequences in finite populations. Genetics 105, 437–460 (1983)

    Google Scholar 

  83. Timm, A., Yin, J.: Kinetics of virus production from single cells. Virology 424(1), 11–17 (2012)

    Article  Google Scholar 

  84. Wakeley, J.: Coalescent Theory. Roberts & Co, Greenwood Village (2007)

    MATH  Google Scholar 

  85. Wakeley, J., Takahashi, T.: Gene genealogies when the sample size exceeds the effective size of the population. Mol. Biol. Evol. 20, 208–2013 (2003)

    Article  Google Scholar 

  86. Waples, R.S.: Tiny estimates of the \({N_e}/{N}\) ratio in marine fishes: are they real? J. Fish Biol. 89(6), 2479–2504 (2016). https://doi.org/10.1111/jfb.13143

    Article  Google Scholar 

  87. Wiuf, C., Donnelly, P.: Conditional genealogies and the age of a neutral mutant. Theor. Popul. Biol. 56(2), 183–201 (1999). https://doi.org/10.1006/tpbi.1998.1411

Download references

Acknowledgements

We thank Alison Etheridge for many and very valuable comments and suggestions, especially regarding Theorem 1. BE was funded by DFG grant STE 325/17-1 to Wolfgang Stephan through Priority Programme SPP1819: Rapid Evolutionary Adaptation. FF was funded by DFG grant FR 3633/2-1 through Priority Program 1590: Probabilistic Structures in Evolution.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bjarki Eldon.

Appendices

A1 Population Models

In this section we provide a brief overview of the population models behind the coalescent processes we consider, and why we think they are interesting. A detailed description of the coalescent processes is given in Sect. A2.

A universal mechanism among all biological populations is reproduction and inheritance. Reproduction refers to the generation of offspring, and inheritance refers to the transmission of information necessary for viability and reproduction. Mendel’s laws on independent segregation of chromosomes into gametes describe the transmission of information from a parent to an offspring in a diploid population. For our purposes, however, it suffices to think of haploid populations where one can think of an individual as a single gene copy. By tracing gene copies as they are passed on from one generation to the next one automatically stores two sets of information. On the one hand one stores how frequencies of genetic types change going forwards in time; on the other hand one keeps track of the ancestral, or genealogical, relations among the different copies. This duality has been successfully exploited for example in modeling selection [34, 35]. To model genetic variation in natural populations one requires a mathematically tractable model of how genetic information is passed from parents to offspring. In the Wright–Fisher model offspring choose their parents independently and uniformly at random. Suppose we are tracing the ancestry of \(n \ge 2\) gene copies in a haploid Wright–Fisher population of N gene copies in total. For any pair, the chance that they have a common ancestor in the previous generation is 1 / N. Informally, we trace the genealogy of our gene copies on the order of \({\mathscr {O}}(N)\) generations until we see the first merger, i.e. when at least 2 gene copies (or their ancestral lines) find a common ancestor. If n is small relative to N, when a merger occurs, with probability \(1-{\mathscr {O}}( 1/N)\) it involves just two ancestral lineages. This means that if we measure time in units of N generations, and assume N is very large, the random ancestral relations of our sampled gene copies can be described by a continuous-time Markov chain in which each pair of ancestral lines merges at rate 1 and no other mergers are possible. We have, in an informal way, arrived at the Kingman-coalescent [56,57,58]. One can derive the Kingman-coalescent not just from the Wright–Fisher model but from any population model which satisfies certain assumptions on the offspring distribution [61, 64, 71]. These assumptions mainly dictate that higher moments of the offspring number distribution are small relative to (an appropriate power of) the population size. The Kingman-coalescent, and its various extensions, are used almost universally as the ‘null model’ for a gene genealogy in population genetics. The Kingman-coalescent is a remarkably good model for populations characterised by low fecundity, i.e. whose individuals have small numbers of offspring relative to the population size.

The classical Kingman-coalescent is derived from a population model in which the population size is constant between generations. Extensions to stochastically varying population size, in which the population size does not vary ‘too much’ between generations, have been made [54]; the result is a time-changed Kingman-coalescent. Probably the most commonly applied model of deterministically changing population size is the model of exponential population growth (see eg. [25, 30, 41]). In each generation the population size is multiplied by a factor \((1+\beta /N)\), where \(\beta > 0\). Therefore, the population size in generation k going forward in time is given by \(N_k = N(1 + \beta / N)^k\) where N is taken as the ‘initial’ population size. It follows that the population size \(\lfloor Nt\rfloor \) generations ago is \(Ne^{-\beta t}\). [30] show that exponential population growth can be distinguished from multiple-merger coalescents (in which at least three ancestral lineages can merge simultaneously), derived from population models of high fecundity and sweepstakes reproduction, using population genetic data from a single locus, provided that sample size and number of mutations (segregating sites) are not too small.

A diverse group of natural populations, including some marine organisms [46], fungi [1, 51, 79], and viruses [83] are highly fecund. By way of example, individual Atlantic codfish [60, 68] and Pacific oysters [59] can lay millions of eggs. This high fecundity counteracts the high mortality rate among the larvae (juveniles) of these populations (Type III survivorship). The term ‘sweepstakes reproduction’ has been proposed to describe the reproduction mode of highly fecund populations with Type III survivorship [45]. Population models which admit high fecundity and sweepstakes reproduction (HFSR) through skewed or heavy-tailed offspring number distributions have been developed [31, 53, 64, 65, 73, 78]. In the haploid model of [78], each individual independently contributes a random number X of juveniles where \((C, \alpha > 0)\)

$$\begin{aligned} {\mathbb {P}}\left( X \ge k \right) \sim \frac{C}{k^\alpha }, \quad k \rightarrow \infty , \end{aligned}$$
(A28)

and \(x_n \sim y_n\) means \(x_n/y_n \rightarrow 1\) as \(n\rightarrow \infty \). The constant \(C > 0\) is a normalising constant, and the constant \(\alpha \) determines the skewness of the distribution. The next generation of individuals is then formed by sampling (uniformly without replacement) from the pool of juveniles. In the case \(\alpha < 2\) the random ancestral relations of gene copies can be described by specific forms of multiple-merger coalescent processes [72]. We remark that the fate of the juveniles need not be correlated to generate multiple-mergers in the genealogies — the heavy-tailed distribution of juveniles means that occasionally one ‘lucky’ individual contributes a huge number of juveniles while all others contribute only a small number of juveniles. Uniform sampling without replacement from the pool of juveniles means that the lucky individual leaves significantly more descendents in the next generation than anyone else, and this is what generates multiple mergers of ancestral lines.

Coalescent processes derived from population models of HFSR (see (A28) for an example) admit multiple mergers of ancestral lineages [24, 63, 65, 70,71,72, 76]. Mathematically, we consider exchangeable n-coalescent processes, which are Markovian processes \((\varPi _t^{(n)})_{t\ge 0}\) on the set of partitions of \([n] := \{1,2,\ldots , n\}\) whose transitions are mergers of partition blocks (a ‘block’ is a subset of [n], see Sect. A2) with rates specified in Sect. A2. The blocks of \(\varPi _t^{(n)}\) show which individuals in [n] share a common ancestor at time t measured from the time of sampling. Thus, the blocks of \(\varPi _t^{(n)}\) can be interpreted as ancestral lineages. The specific structure of the transition rates allows to treat a multiple-merger n-coalescent as the restriction of an exchangeable Markovian process \((\varPi _t)_{t\ge 0}\) on the set of partitions of \({\mathbb {N}}\), which is called a multiple-merger coalescent (abbreviated MMC) process. MMC processes are referred to as \(\varLambda \)-coalescents (\(\varLambda \) a finite measure on [0, 1]) [24, 70, 71] if any number of ancestral lineages can merge at any given time, but only one such merger occurs at a time. By way of an example, if \(1 \le \alpha < 2\) in (A28) one obtains a so-called Beta\((2-\alpha ,\alpha )\)-coalescent [72] (Beta-coalescent, see Eq. (A35)). Processes which admit at least two (multiple) mergers at a time are referred to as \(\varXi \)-coalescents (\(\varXi \) a finite measure on the infinite simplex \(\varDelta \)) [64, 65, 76]. See Sect. A2 for details. Specific examples of these MMC processes have been shown to give a better fit to genetic data sampled from Atlantic cod [2, 12, 16, 18, 19] and Japanese sardines [67] than the classical Kingman-coalescent. See e.g. [29] for an overview of inference methods for MMC processes. [46] review the evidence for sweepstakes reproduction among marine populations and conclude ‘that it plays a major role in shaping marine biodiversity’.

MMC models also arise in contexts other than high fecundity. [17] show that repeated strong bottlenecks in a Wright–Fisher population lead to time-changed Kingman-coalescents which look like \(\varXi \)-coalescents. [27, 28] show that the genealogy of a locus subjected to repeated beneficial mutations is well approximated by a \(\varXi \)-coalescent. [75] provides rigorous justification of the claims of [22, 66] that the genealogy of a population subject to repeated beneficial mutations can be described by the Beta-coalescent with \(\alpha = 1\) (also referred to as the Bolthausen–Sznitman coalescent [20]). These examples show that MMC processes are relevant for biology. We refer the interested reader to e.g. [5, 9, 10, 13, 25, 33] for a more detailed background on coalescent theory.

A2 Coalescent Processes

To keep our presentation self-contained a precise definition of the coalescent processes we will need will now be given. We follow the description of [19]. A coalescent process \(\varPi \) is a continuous-time Markov chain on the partitions of \({\mathbb {N}}\). Let \(\varPi ^{(n)}\) denote the restriction to [n], and write \({\mathscr {P}}_n\) for the space of partitions of [n]. A partition \(\pi = \{\pi _1, \ldots , \pi _{\#\pi } \} \in {\mathscr {P}}_n\) has \(\#\pi \) blocks which are disjoint subsets of [n]. We assume the blocks \(\pi _i\) are ordered by their smallest element; therefore we always have \(1 \in \pi _1\). In general a merging event can involve r distinct groups of blocks merging simultaneously. We write \(\underline{k} = (k_1, \ldots , k_r)\) where \(k_i \ge 2\) denotes the number of blocks merging in group i. Here \(r \in [\lfloor \#\pi / 2 \rfloor ]\), \(k_1 + \cdots + k_r \in [\#\pi ]_2\) and \(i_1^{(a)},\ldots , i_{k_a}^{(a)}\) will denote the indices of the blocks in the \(a\hbox {th}\) group. By \(\pi ^\prime \prec _{ \#\pi , \underline{k}} \pi \) we denote a transition from \(\pi \) to \(\pi ^\prime = A\cup B\) where

$$\begin{aligned} \begin{aligned} A =\,&\left\{ \pi _\ell : \ell \in [\#\pi ], \ell \notin \bigcup _{a=1}^r \left\{ i_{1}^{(a)}, \ldots , i_{k_a}^{(a)} \right\} \right\} , \\ B =\,&\bigcup _{b=1}^r \left\{ \pi _{i_{1}^{(b)} }, \ldots , \pi _{i_{k_b}^{(b)} } \right\} . \\ \end{aligned} \end{aligned}$$
(A29)

In (A29), set A (possibly empty) contains the blocks not involved in a merger, and B lists the blocks involved in each of the r mergers. By \(\pi ^\prime \prec _{ \#\pi , k} \pi \) we denote the transition in a \(\varLambda \)-coalescent where \(k \in [\#\pi ]_2\) merge in a single merger and \(\pi ^\prime \) is given as in (A29) with \(r = 1\); ie. only one group of blocks merges in each transition. By \(\pi ^\prime \prec _{\#\pi } \pi \) we denote a transition in the Kingman-coalescent where \(r = 1\) and 2 blocks merge in each transition.

Now that we have specified the possible transitions, we can state the rates of the transitions. Let \(\varDelta \) denote the infinite simplex \(\varDelta = \{(x_1, x_2, \ldots ) : x_1 \ge x_2 \ge \ldots \ge 0, \sum _i x_i \le 1\}\); let \(\varvec{x}\) denote an element of \(\varDelta \). Define the functions \(f(\varvec{x};\#\pi ,\underline{k})\) and \(g(\varvec{x};\#\pi ,\underline{k})\) on \(\varDelta _{\varvec{0}} := \varDelta {\setminus } \{(0,0,\ldots )\}\) where \(\left( \prod _{m=1}^0 x_{i_{r+m}} := 1 \right) \), and \(s = \#\pi - k_1 - \ldots - k_r\), by

$$\begin{aligned} \begin{aligned} f(\varvec{x};\#\pi ,\underline{k}) =\,&\frac{1}{\sum _j x_j^2 } \sum _{\ell = 0}^s \sum _{i_1 \ne \ldots \ne i_{r + \ell } } \left( {\begin{array}{c}s \\ \ell \end{array}}\right) x_{i_1}^{k_1}\cdots x_{i_r}^{k_r} \prod _{m=1}^\ell x_{i_{r+m}} \left( 1 - \sum _j x_j \right) ^{s - \ell }, \\ g(\varvec{x};n ) =\,&\frac{1 - \sum \limits _{\ell = 0 }^{n } \sum \limits _{i_1 \ne \ldots \ne i_{\ell } } \left( {\begin{array}{c} n \\ \ell \end{array}}\right) x_{i_1}\cdots x_{i_\ell } \left( 1 - \sum _j x_j \right) ^{n - \ell } }{\sum _j x_j^2 }. \\ \end{aligned} \end{aligned}$$
(A30)

where \(x_{i_0} := 1\). For a finite measure \(\varXi \) on \(\varDelta \), set \(\varXi _0:=\varXi (\cdot \cap \varDelta _0)\) and \(a:=\varXi (\{(0,0,\ldots )\})\). Then, define

$$\begin{aligned} \begin{aligned} \lambda _{n,\underline{k} } :=\,&\int _{ \varDelta _{\varvec{0}}} f(\varvec{x},n,\underline{k} ) \varXi _{\varvec{0}} d\varvec{x} + a{\mathbb {1}}_{r=1,k_1=2}, \\ \lambda _{n} :=\,&\int _{ \varDelta _{\varvec{0}}} g(\varvec{x},n) \varXi _{\varvec{0}} d\varvec{x} + a\left( {\begin{array}{c}n\\ 2\end{array}}\right) . \end{aligned} \end{aligned}$$
(A31)

A \(\varXi \)-coalescent [76] is a continuous-time \({\mathscr {P}}_n\)-valued Markov chain with transitions \(q_{\pi , \pi ^\prime }\) given by, where \( \lambda _{n,\underline{k} }\) and \(\lambda _n \) are given in (A31),

$$\begin{aligned} q_{\pi , \pi ^\prime } = {\left\{ \begin{array}{ll} \lambda _{n,\underline{k} } &{} \text {if }\pi ^\prime \prec _{ \#\pi , \underline{k}} \pi , \#\pi = n, \\ -\lambda _n &{} \text {if }\pi ^\prime = \pi \text { and }n = \#\pi , \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(A32)

A \(\varLambda \)-coalescent [24, 70, 71] is a specific case of a \(\varXi \)-coalescent where \(\varXi _{\varvec{0}}\) only has support on \(\varDelta _0 := \varDelta _{\varvec{0}} \cap \{(x_1, x_2, \ldots ) : x_1 \in (0,1],\, x_{1+i} = 0\,\, \forall \,\, i \in {\mathbb {N}}\}\) [76]. Let \(\varLambda \) denote the restriction of \(\varXi \) on its first coordinate (which makes \(\varLambda \) a finite measure on [0, 1]). The transition rate of \(\pi ^\prime \prec _{ \#\pi , k} \pi \) becomes, where \(\#\pi = n\), \(2 \le k \le n\),

$$\begin{aligned} \lambda _{n,k} = \int _0^1 x^{k-2}(1-x)^{n-k}\varLambda (dx), \quad 2 \le k \le n. \end{aligned}$$
(A33)

The total rate of k-mergers in a \(\varLambda \)-coalescent is given by \( \lambda _k(n) = \genfrac(){0.0pt}1{n}{k} \lambda _{n,k}\) for \(2 \le k \le n\). The total rate of mergers given \(n \ge 2\) active blocks is

$$\begin{aligned} \lambda (n) = \lambda _2(n) + \cdots + \lambda _n(n). \end{aligned}$$
(A34)

An important example of a \(\varLambda \)-coalescent is the Beta\((2 - \alpha , \alpha )\)-coalescent [78] where the \(\varLambda \) measure is associated with the beta density, where \(B(\cdot ,\cdot )\) is the beta function,

$$\begin{aligned} \varLambda (dx) = \frac{ x^{1-\alpha }(1-x)^{\alpha - 1} }{B(2-\alpha ,\alpha )}dx, \quad 1 \le \alpha < 2. \end{aligned}$$
(A35)

The total rate of a k-merger \( \lambda _k(n) = \genfrac(){0.0pt}1{n}{k} \lambda _{n,k} \) (see Eq. (A33)) is then given by, for \(2 \le k \le n\),

$$\begin{aligned} \lambda _k(n) = \left( {\begin{array}{c}n\\ k\end{array}}\right) \frac{B(k-\alpha , n-k+\alpha )}{B(2-\alpha ,\alpha )}, \quad 1 \le \alpha < 2. \end{aligned}$$
(A36)

For \(\alpha = 1\) the Beta\((2 - \alpha ,\alpha )\)-coalescent is the Bolthausen–Sznitman coalescent [20, 39]. The Beta-coalescent is well-studied, there are connections to superprocesses, continuous-state branching processes (CSBP) and continuous stable random trees as described e.g. in [7, 14].

A3 Goldschmidt and Martin’s Construction of the Bolthausen–Sznitman n-coalescent

From [39], we recall the construction of the Bolthausen–Sznitman n-coalescent by cutting the edges of a random recursive tree. Let \({\mathbb {T}}_n\) be a random recursive tree with n nodes. We can construct \({\mathbb {T}}_n\) sequentially as follows

  1. (i)

    Start with a node labelled with 1 (the root) and no edges,

  2. (ii)

    If \(i<n\) nodes are present, add a node labelled with \(i+1\) and one edge connecting it to a node in [i] picked uniformly,

  3. (iii)

    stop if n nodes are present.

The object \({\mathbb {T}}_n\) is a labelled tree, each node has a single label. We consider a realisation of \({\mathbb {T}}_n\) and transform this tree over time into labelled trees with fewer nodes with nodes amassing multiple labels.

  1. (i)

    Each edge of \({\mathbb {T}}_n\) is linked to an exponential clock. The clocks are i.i.d. Exp(1)-distributed.

  2. (ii)

    We wait for the first clock to ring. At this time, we cut/remove the edge whose clock rang first. The tree is thus split in two trees, one of these trees includes the node with label 1. We denote this tree by \({\mathbb {T}}^{(1)}\), the other tree by \({\mathbb {T}}^{(2)}\). Let \(e_1\) be the node of \({\mathbb {T}}^{(1)}\) that was connected to the removed edge.

  3. (iii)

    All labels of \({\mathbb {T}}^{(2)}\) are added to the set of labels of \(e_1\). Remove \({\mathbb {T}}^{(2)}\) including its clocks.

  4. (iv)

    Repeat from (ii), using \({\mathbb {T}}^{(1)}\) labelled as in (iii) with the (remaining) clocks from (i). Stop when \({\mathbb {T}}^{(1)}\) in step (iii) consists of only a single node and no edges.

  5. (v)

    For any time t, label sets at the nodes of \({\mathbb {T}}^{(1)}\) (\({\mathbb {T}}_n\) before the first clock has rang) give a partition \(\varPi ^{(n)}_t\) of [n]. The process \((\varPi ^{(n)}_t)_{t\ge 0}\) is a Bolthausen–Sznitman n-coalescent (set \(\varPi ^{(n)}_t=[n]\) if t is bigger than the time at which we stopped the cutting procedure).

Figure 7 shows an illustration of steps (i)–(iii) for a realisation of \({\mathbb {T}}_5\).

Fig. 7
figure 7

Example for the first cutting and relabelling step (ii), (iii) for the construction from [39]

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eldon, B., Freund, F. Genealogical Properties of Subsamples in Highly Fecund Populations. J Stat Phys 172, 175–207 (2018). https://doi.org/10.1007/s10955-018-2013-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10955-018-2013-1

Keywords

Mathematics Subject Classification

Navigation