# Genealogical Properties of Subsamples in Highly Fecund Populations

## Abstract

We consider some genealogical properties of nested samples. The complete sample is assumed to have been drawn from a natural population characterised by high fecundity and sweepstakes reproduction (abbreviated HFSR). The random gene genealogies of the samples are—due to our assumption of HFSR—modelled by coalescent processes which admit multiple mergers of ancestral lineages looking back in time. Among the genealogical properties we consider are the probability that the most recent common ancestor is shared between the complete sample and the subsample nested within the complete sample; we also compare the lengths of ‘internal’ branches of nested genealogies between different coalescent processes. The results indicate how ‘informative’ a subsample is about the properties of the larger complete sample, how much information is gained by increasing the sample size, and how the ‘informativeness’ of the subsample varies between different coalescent processes.

## Keywords

Coalescent High fecundity Nested samples Multiple mergers Time to most recent common ancestor## Mathematics Subject Classification

92D15 60J28## Notes

### Acknowledgements

We thank Alison Etheridge for many and very valuable comments and suggestions, especially regarding Theorem 1. BE was funded by DFG grant STE 325/17-1 to Wolfgang Stephan through Priority Programme SPP1819: Rapid Evolutionary Adaptation. FF was funded by DFG grant FR 3633/2-1 through Priority Program 1590: Probabilistic Structures in Evolution.

## Supplementary material

## References

- 1.Agrios, G.: Plant Pathology. Academic Press, Amsterdam (2005)Google Scholar
- 2.Árnason, E., Halldórsdóttir, K.: Nucleotide variation and balancing selection at the Ckma gene in Atlantic cod: analysis with multiple merger coalescent models. PeerJ
**3**, e786 (2015). https://doi.org/10.7717/peerj.786 CrossRefGoogle Scholar - 3.Arratia, R., Barbour, A.D., Tavaré, S.: Logarithmic Combinatorial Structures: A Probabilistic Approach. European Mathematical Society (EMS), Zürich (2003)CrossRefzbMATHGoogle Scholar
- 4.Barney, B.T., Munkholm, C., Walt, D.R., Palumbi, S.R.: Highly localized divergence within supergenes in atlantic cod (gadus morhua) within the gulf of maine. BMC Genomics
**18**(1) (2017). https://doi.org/10.1186/s12864-017-3660-3 - 5.Barton, N.H., Etheridge, A.M., Véber, A.: Modelling evolution in a spatial continuum. J. Stat. Mech.
**2013**(01), P01,002 (2013). http://stacks.iop.org/1742-5468/2013/i=01/a=P01002 - 6.Basu, A., Majumder, P.P.: A comparison of two popular statistical methods for estimating the time to most recent common ancestor (tmrca) from a sample of DNA sequences. J. Genet.
**82**(1–2), 7–12 (2003)CrossRefGoogle Scholar - 7.Berestycki, J., Berestycki, N., Schweinsberg, J.: Beta-coalescents and continuous stable random trees. Ann. Probab.
**35**, 1835–1887 (2007)MathSciNetCrossRefzbMATHGoogle Scholar - 8.Berestycki, J., Berestycki, N., Schweinsberg, J.: Small-time behavior of beta coalescents. Ann. Inst. H Poincaré Probab. Stat.
**44**, 214–238 (2008)ADSMathSciNetCrossRefzbMATHGoogle Scholar - 9.Berestycki, N.: Recent progress in coalescent theory. Ensaios Mathématicos
**16**, 1–193 (2009)MathSciNetzbMATHGoogle Scholar - 10.Bertoin, J.: Exchangeable coalescents. Cours d’école doctorale, pp. 20–24 (2010)Google Scholar
- 11.Bhaskar, A., Clark, A., Song, Y.: Distortion of genealogical properties when the sample size is very large. PNAS
**111**, 2385–2390 (2014)ADSCrossRefGoogle Scholar - 12.Birkner, M., Blath, J.: Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. J. Math. Biol.
**57**, 435–465 (2008)MathSciNetCrossRefzbMATHGoogle Scholar - 13.Birkner, M., Blath, J.: Coalescents and population genetic inference. Trends Stoch. Anal.
**353**, 329 (2009)CrossRefzbMATHGoogle Scholar - 14.Birkner, M., Blath, J., Capaldo, M., Etheridge, A.M., Möhle, M., Schweinsberg, J., Wakolbinger, A.: Alpha-stable branching and beta-coalescents. Electron. J. Probab.
**10**, 303–325 (2005)MathSciNetCrossRefzbMATHGoogle Scholar - 15.Birkner, M., Blath, J., Eldon, B.: An ancestral recombination graph for diploid populations with skewed offspring distribution. Genetics
**193**, 255–290 (2013)CrossRefGoogle Scholar - 16.Birkner, M., Blath, J., Eldon, B.: Statistical properties of the site-frequency spectrum associated with \(\varLambda \)-coalescents. Genetics
**195**, 1037–1053 (2013)CrossRefGoogle Scholar - 17.Birkner, M., Blath, J., Möhle, M., Steinrücken, M., Tams, J.: A modified lookdown construction for the Xi-Fleming-Viot process with mutation and populations with recurrent bottlenecks. ALEA Lat. Am. J. Probab. Math. Stat.
**6**, 25–61 (2009)MathSciNetzbMATHGoogle Scholar - 18.Birkner, M., Blath, J., Steinrücken, M.: Analysis of DNA sequence variation within marine species using Beta-coalescents. Theor. Popul. Biol.
**87**, 15–24 (2013)CrossRefzbMATHGoogle Scholar - 19.Blath, J., Cronjäger, M.C., Eldon, B., Hammer, M.: The site-frequency spectrum associated with \(\varXi \)-coalescents. Theor. Popul. Biol.
**110**, 36–50 (2016). https://doi.org/10.1016/j.tpb.2016.04.002 CrossRefzbMATHGoogle Scholar - 20.Bolthausen, E., Sznitman, A.: On Ruelle’s probability cascades and an abstract cavity method. Commun. Math. Phys.
**197**, 247–276 (1998)ADSMathSciNetCrossRefzbMATHGoogle Scholar - 21.Capra, J.A., Stolzer, M., Durand, D., Pollard, K.S.: How old is my gene? Trends Genet.
**29**(11), 659–668 (2013)CrossRefGoogle Scholar - 22.Desai, M.M., Walczak, A.M., Fisher, D.S.: Genetic diversity and the structure of genealogies in rapidly adapting populations. Genetics
**193**(2), 565–585 (2013)CrossRefGoogle Scholar - 23.Dong, R., Gnedin, A., Pitman, J.: Exchangeable partitions derived from markovian coalescents. Ann. Appl. Probab.
**17**, 1172–1201 (2007)MathSciNetCrossRefzbMATHGoogle Scholar - 24.Donnelly, P., Kurtz, T.G.: Particle representations for measure-valued population models. Ann. Probab.
**27**, 166–205 (1999)MathSciNetCrossRefzbMATHGoogle Scholar - 25.Donnelly, P., Tavare, S.: Coalescents and genealogical structure under neutrality. Annu. Rev. Genet.
**29**(1), 401–421 (1995)CrossRefGoogle Scholar - 26.Durrett, R.: Probability Models for DNA Sequence Evolution, 2nd edn. Springer, New York (2008)CrossRefzbMATHGoogle Scholar
- 27.Durrett, R., Schweinsberg, J.: Approximating selective sweeps. Theor. Popul. Biol.
**66**, 129–138 (2004)CrossRefzbMATHGoogle Scholar - 28.Durrett, R., Schweinsberg, J.: A coalescent model for the effect of advantageous mutations on the genealogy of a population. Stoch. Proc. Appl.
**115**, 1628–1657 (2005)MathSciNetCrossRefzbMATHGoogle Scholar - 29.Eldon, B.: Inference methods for multiple merger coalescents. In: Pontarotti, P. (ed.) Evolutionary Biology: Convergent Evolution, Evolution of Complex Traits, Concepts and Methods, pp. 347–371. Springer, New York (2016)CrossRefGoogle Scholar
- 30.Eldon, B., Birkner, M., Blath, J., Freund, F.: Can the site-frequency spectrum distinguish exponential population growth from multiple-merger coalescents. Genetics
**199**, 841–856 (2015)CrossRefGoogle Scholar - 31.Eldon, B., Wakeley, J.: Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics
**172**, 2621–2633 (2006)CrossRefGoogle Scholar - 32.Eldon, B., Wakeley, J.: Linkage disequilibrium under skewed offspring distribution among individuals in a population. Genetics
**178**, 1517–1532 (2008)CrossRefGoogle Scholar - 33.Etheridge, A.: Some Mathematical Models from Population Genetics. Springer, Berlin (2011). https://doi.org/10.1007/978-3-642-16632-7 CrossRefzbMATHGoogle Scholar
- 34.Etheridge, A., Griffiths, R.: A coalescent dual process in a Moran model with genic selection. Theor. Popul. Biol.
**75**, 320–330 (2009)CrossRefzbMATHGoogle Scholar - 35.Etheridge, A.M., Griffiths, R.C., Taylor, J.E.: A coalescent dual process in a Moran model with genic selection, and the Lambda coalescent limit. Theor. Popul. Biol.
**78**, 77–92 (2010)CrossRefzbMATHGoogle Scholar - 36.Ewens, W.J.: Mathematical Population Genetics 1: Theoretical Introduction, vol. 27. Springer, New York (2012)zbMATHGoogle Scholar
- 37.Freund, F., Möhle, M.: On the size of the block of 1 for \(\varXi \)-coalescents with dust. Modern Stoch. Theory Appl. 4(4), 407–425 (2017). https://doi.org/10.15559/17-VMSTA92
- 38.Freund, F., Siri-Jégousse, A.: Minimal clade size in the bolthausen-sznitman coalescent. J. Appl. Probab.
**51**(3), 657–668 (2014)MathSciNetCrossRefzbMATHGoogle Scholar - 39.Goldschmidt, C., Martin, J.B.: Random recursive trees and the Bolthausen-Sznitman coalescent. Electron. J. Probab.
**10**(21), 718–745 (2005)MathSciNetCrossRefzbMATHGoogle Scholar - 40.Griffiths, R.C., Tavare, S.: Monte carlo inference methods in population genetics. Math. Comput. Model.
**23**(8–9), 141–158 (1996)MathSciNetCrossRefzbMATHGoogle Scholar - 41.Griffiths, R.C., Tavaré, S.: The age of a mutation in a general coalescent tree. Commun. Stat. Stoch. Model.
**14**, 273–295 (1998)MathSciNetCrossRefzbMATHGoogle Scholar - 42.Griswold, C.K., Baker, A.J.: Time to the most recent common ancestor and divergence times of populations of common chaffinches (Fringilla coelebs) in Europe and North Africa: insights into Pleistocene refugia and current levels of migration. Evolution
**56**(1), 143–153 (2002)CrossRefGoogle Scholar - 43.Halldórsdóttir, K., Árnason, E.: Whole-genome sequencing uncovers cryptic and hybrid species among Atlantic and Pacific cod-fish (2015). https://www.biorxiv.org/content/early/2015/12/20/034926
- 44.Hintze, J.L., Nelson, R.D.: Violin plots: a box plot-density trace synergism. Am. Stat.
**52**(2), 181–184 (1998). https://doi.org/10.1080/00031305.1998.10480559 Google Scholar - 45.Hedgecock, D.: Does variance in reproductive success limit effective population sizes of marine organisms? In: Beaumont, A. (ed.) Genetics and Evolution of Aquatic Organisms, pp. 1222–1344. Chapman and Hall, London (1994)Google Scholar
- 46.Hedgecock, D., Pudovkin, A.I.: Sweepstakes reproductive success in highly fecund marine fish and shellfish: a review and commentary. Bull Mar. Sci.
**87**, 971–1002 (2011)CrossRefGoogle Scholar - 47.Hedrick, P.: Large variance in reproductive success and the \({N}_e/{N}\) ratio. Evolution
**59**(7), 1596 (2005). https://doi.org/10.1554/05-009 CrossRefGoogle Scholar - 48.Hénard, O.: The fixation line in the \({\varLambda }\)-coalescent. Ann. Appl. Probab.
**25**(5), 3007–3032 (2015)MathSciNetCrossRefzbMATHGoogle Scholar - 49.Herriger, P., Möhle, M.: Conditions for exchangeable coalescents to come down from infinity. Alea
**9**(2), 637–665 (2012)MathSciNetzbMATHGoogle Scholar - 50.Hird, S., Kubatko, L., Carstens, B.: Rapid and accurate species tree estimation for phylogeographic investigations using replicated subsampling. Mol. Phylogenetics Evol.
**57**(2), 888–898 (2010)CrossRefGoogle Scholar - 51.Hovmøller, M.S., Sørensen, C.K., Walter, S., Justesen, A.F.: Diversity of
*Puccinia striiformis*on cereals and grasses. Annu. Rev. Phytopathol.**49**, 197–217 (2011)CrossRefGoogle Scholar - 52.Hudson, R.R.: Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol.
**23**, 183–201 (1983)CrossRefzbMATHGoogle Scholar - 53.Huillet, T., Möhle, M.: On the extended Moran model and its relation to coalescents with multiple collisions. Theor. Popul. Biol.
**87**, 5–14 (2013)CrossRefzbMATHGoogle Scholar - 54.Kaj, I., Krone, S.M.: The coalescent process in a population with stochastically varying size. J. Appl. Probab.
**40**(01), 33–48 (2003)MathSciNetCrossRefzbMATHGoogle Scholar - 55.King, L., Wakeley, J.: Empirical bayes estimation of coalescence times from nucleotide sequence data. Genetics
**204**(1), 249–257 (2016). https://doi.org/10.1534/genetics.115.185751 CrossRefGoogle Scholar - 56.Kingman, J.F.C.: The coalescent. Stoch. Proc. Appl.
**13**, 235–248 (1982)MathSciNetCrossRefzbMATHGoogle Scholar - 57.Kingman, J.F.C.: Exchangeability and the evolution of large populations. In: Koch, G., Spizzichino, F. (eds.) Exchangeability in Probability and Statistics, pp. 97–112. North-Holland, Amsterdam (1982)Google Scholar
- 58.Kingman, J.F.C.: On the genealogy of large populations. J. Appl. Probab.
**19A**, 27–43 (1982)MathSciNetCrossRefzbMATHGoogle Scholar - 59.Li, G., Hedgecock, D.: Genetic heterogeneity, detected by PCR-SSCP, among samples of larval Pacific oysters ( Crassostrea gigas ) supports the hypothesis of large variance in reproductive success. Can. J. Fish. Aquat. Sci.
**55**(4), 1025–1033 (1998). https://doi.org/10.1139/f97-312 CrossRefGoogle Scholar - 60.May, A.W.: Fecundity of Atlantic cod. J. Fish. Res. Board Can.
**24**, 1531–1551 (1967)CrossRefGoogle Scholar - 61.Möhle, M.: Robustness results for the coalescent. J. Appl. Probab.
**35**(02), 438–447 (1998)MathSciNetCrossRefzbMATHGoogle Scholar - 62.Möhle, M.: On sampling distributions for coalescent processes with simultaneous multiple collisions. Bernoulli
**12**(1), 35–53 (2006)MathSciNetzbMATHGoogle Scholar - 63.Möhle, M.: Coalescent processes derived from some compound Poisson population models. Electron. Commun. Probab.
**16**, 567–582 (2011)MathSciNetCrossRefzbMATHGoogle Scholar - 64.Möhle, M., Sagitov, S.: A classification of coalescent processes for haploid exchangeable population models. Ann. Probab.
**29**, 1547–1562 (2001)MathSciNetCrossRefzbMATHGoogle Scholar - 65.Möhle, M., Sagitov, S.: Coalescent patterns in diploid exchangeable population models. J. Math. Biol.
**47**, 337–352 (2003)MathSciNetCrossRefzbMATHGoogle Scholar - 66.Neher, R.A., Hallatschek, O.: Genealogies of rapidly adapting populations. Proc. Natl. Acad. Sci.
**110**(2), 437–442 (2013)ADSCrossRefGoogle Scholar - 67.Niwa, H.S., Nashida, K., Yanagimoto, T.: Reproductive skew in japanese sardine inferred from DNA sequences. ICES J. Mar. Sci.
**73**(9), 2181–2189 (2016). https://doi.org/10.1093/icesjms/fsw070 CrossRefGoogle Scholar - 68.Oosthuizen, E., Daan, N.: Egg fecundity and maturity of North Sea cod,
*Gadus morhua*. Neth. J. Sea Res.**8**(4), 378–397 (1974)CrossRefGoogle Scholar - 69.Pettengill, J.B.: The time to most recent common ancestor does not (usually) approximate the date of divergence. PloS ONE
**10**(8), e0128,407 (2015)CrossRefGoogle Scholar - 70.Pitman, J.: Coalescents with multiple collisions. Ann. Probab.
**27**, 1870–1902 (1999)MathSciNetCrossRefzbMATHGoogle Scholar - 71.Sagitov, S.: The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Probab.
**36**, 1116–1125 (1999)MathSciNetCrossRefzbMATHGoogle Scholar - 72.Sagitov, S.: Convergence to the coalescent with simultaneous mergers. J. Appl. Probab.
**40**, 839–854 (2003)MathSciNetCrossRefzbMATHGoogle Scholar - 73.Sargsyan, O., Wakeley, J.: A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theor. Popul. Biol.
**74**, 104–114 (2008)CrossRefzbMATHGoogle Scholar - 74.Saunders, I.W., Tavaré, S., Watterson, G.A.: On the genealogy of nested subsamples from a haploid population. Adv. Appl. Probab.
**16**(3), 471 (1984). https://doi.org/10.2307/1427285 MathSciNetCrossRefzbMATHGoogle Scholar - 75.Schweinsberg, J.: Rigorous results for a population model with selection II: genealogy of the population. Electron. J. Probab. https://doi.org/10.1214/17-EJP58 (2017)
- 76.Schweinsberg, J.: Coalescents with simultaneous multiple collisions. Electron. J. Probab.
**5**, 1–50 (2000)MathSciNetCrossRefzbMATHGoogle Scholar - 77.Schweinsberg, J.: A necessary and sufficient condition for the-coalescent to come down from the infinity. Electron. Commun. Probab.
**5**, 1–11 (2000)MathSciNetCrossRefzbMATHGoogle Scholar - 78.Schweinsberg, J.: Coalescent processes obtained from supercritical Galton-Watson processes. Stoch. Proc. Appl.
**106**, 107–139 (2003)MathSciNetCrossRefzbMATHGoogle Scholar - 79.Simon, M., Cordo, C.: Inheritance of partial resistance to
*Septoria tritici*in wheat (*Triticum aestivum*): limitation of pycnidia and spore production. Agronomie**17**(6–7), 343–347 (1997)CrossRefGoogle Scholar - 80.Slack, R.: A branching process with mean one and possibly infinite variance. Probab. Theory Relat. Fields
**9**(2), 139–145 (1968)MathSciNetzbMATHGoogle Scholar - 81.Spouge, J.L.: Within a sample from a population, the distribution of the number of descendants of a subsample’s most recent common ancestor. Theor. Popul. Biol.
**92**, 51–54 (2014)CrossRefzbMATHGoogle Scholar - 82.Tajima, F.: Evolutionary relationships of DNA sequences in finite populations. Genetics
**105**, 437–460 (1983)Google Scholar - 83.Timm, A., Yin, J.: Kinetics of virus production from single cells. Virology
**424**(1), 11–17 (2012)CrossRefGoogle Scholar - 84.Wakeley, J.: Coalescent Theory. Roberts & Co, Greenwood Village (2007)zbMATHGoogle Scholar
- 85.Wakeley, J., Takahashi, T.: Gene genealogies when the sample size exceeds the effective size of the population. Mol. Biol. Evol.
**20**, 208–2013 (2003)CrossRefGoogle Scholar - 86.Waples, R.S.: Tiny estimates of the \({N_e}/{N}\) ratio in marine fishes: are they real? J. Fish Biol.
**89**(6), 2479–2504 (2016). https://doi.org/10.1111/jfb.13143 CrossRefGoogle Scholar - 87.Wiuf, C., Donnelly, P.: Conditional genealogies and the age of a neutral mutant. Theor. Popul. Biol.
**56**(2), 183–201 (1999). https://doi.org/10.1006/tpbi.1998.1411