Abstract
Recombination is a major force that shapes genetic diversity. Determination of recombination rate is important and can theoretically be improved by increasing the sample size. However, it is nearly impossible to estimate recombination rates using traditional population genetics methods when the sample size is large because these methods are highly computationally demanding. In this study, we used a refined machine learning approach to estimate the recombination rate of the human genome using the UK10K human genomic dataset with 7,562 genomic sequences and its three subsets with 200, 400 and 2,000 genomic sequences. The estimation was performed under the human Out-of-Africa demographic model. We not only obtained an accurate human genetic map, but also found that the fluctuation of estimated recombination rate is reduced along the human genome when the sample size increases. The estimated UK10K recombination rate heterogeneity is less than that estimated from its subsets. Our results demonstrate how the sample size affects the estimated recombination rate, and analyses of a larger number of genomes result in a more precise estimation of recombination rate. The accurate genetic map based on UK10K data set is also expected to benefit other human biology researches.
Similar content being viewed by others
References
Altemose N, Noor N, Bitoun E, Tumian A, Imbeault M, Chapman JR, Aricescu AR, Myers SR (2017) A map of human PRDM9 binding provides evidence for novel behaviors of PRDM9 and other zinc-finger proteins in meiosis. Elife 6:e28383
Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE, Flicek P, Gabriel SB, Gibbs RA, Green ED, Hurles ME, Knoppers BM, Korbel JO, Lander ES, Lee C, Lehrach H, Mardis ER, Marth GT, McVean GA, Nickerson DA, Schmidt JP, Sherry ST, Wang J, Wilson RK, Gibbs RA, Dinh H, Kovar C, Lee S, Lewis L, Muzny D, Reid J, Wang M, Wang J, Fang XD, Guo XS, Jian M, Jiang H, Jin X, Li GQ, Li JX, Li YR, Li Z, Liu X, Lu Y, Ma XD, Su Z, Tai SS, Tang MF, Wang B, Wang GB, Wu HL, Wu RH, Yin Y, Zhang WW, Zhao J, Zhao MR, Zheng XL, Zhou Y, Lander ES, Altshuler DM, Gabriel SB, Gupta N, Flicek P, Clarke L, Leinonen R, Smith RE, Zheng-Bradley X, Bentley DR, Grocock R, Humphray S, James T, Kingsbury Z, Lehrach H, Sudbrak R, Albrecht MW, Amstislavskiy VS, Borodina TA, Lienhard M, Mertes F, Sultan M, Timmermann B, Yaspo ML, Sherry ST, McVean GA, Mardis ER, Wilson RK, Fulton L, Fulton R, Weinstock GM, Durbin RM, Balasubramaniam S, Burton J, Danecek P, Keane TM, Kolb-Kokocinski A, McCarthy S, Stalker J, Quail M et al (2012) An integrated map of genetic variation from 1092 human genomes. Nature 491:56–65
Arbeithuber B, Betancourt AJ, Ebner T, Tiemann-Boege I (2015) Crossovers are associated with mutation and biased gene conversion at recombination hotspots. Proc Natl Acad Sci USA 112:2109–2114
Ardlie KG, Kruglyak L, Seielstad M (2002) Patterns of linkage disequilibrium in the human genome. Nat Rev Genet 3:299–309
Auton A, McVean G (2007) Recombination rate estimation in the presence of hotspots. Genome Res 17:1219–1227
Bell AD, Mello CJ, Nemesh J, Brumbaugh SA, Wysoker A, McCarroll SA (2020) Insights into variation in meiosis from 31,228 human sperm genomes. Nature 583:259–264
Buhlmann P (2006) Boosting for high-dimensional linear models. Ann Stat 34:559–583
Coop G, Przeworski M (2007) An evolutionary view of human recombination. Nat Rev Genet 8:23–34
Cullen M, Perfetto SP, Klitz W, Nelson G, Carrington M (2002) High-resolution patterns of meiotic recombination across the human major histocompatibility complex. Am J Hum Genet 71:759–776
Dapper AL, Payseur BA (2018) Effects of demographic history on the detection of recombination hotspots from linkage disequilibrium. Mol Biol Evol 35:335–353
Fearnhead P, Donnelly P (2001) Estimating recombination rates from population genetic data. Genetics 159:1299–1318
Flagel L, Brandvain Y, Schrider DR (2019) The unreasonable effectiveness of convolutional neural networks in population genetic inference. Mol Biol Evol 36:220–238
Francioli LC, Polak PP, Koren A, Menelaou A, Chun S, Renkens I, van Duijn CM, Swertz M, Wijmenga C, van Ommen G, Slagboom PE, Boomsma DI, Ye K, Guryev V, Arndt PF, Kloosterman WP, de Bakker PIW, Sunyaev SR, Consortium GN (2015) Genome-wide patterns and properties of de novo mutations in humans. Nat Genet 47:822–826
Fu YX (2006) Exact coalescent for the Wright-Fisher model. Theor Popul Biol 69:385–394
Gao F, Ming C, Hu WJ, Li HP (2016) New software for the fast estimation of population recombination rates (FastEPRR) in the genomic era. G3 6:1563–1571
Gartner K, Futschik A (2016) Improved versions of common estimators of the recombination rate. J Comput Biol 23:756–768
Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, Yu FL, Gibbs RA, Bustamante CD, Project G (2011) Demographic history and rare allele sharing among human populations. Proc Natl Acad Sci USA 108:11983–11988
Halldorsson BV, Palsson G, Stefansson OA, Jonsson H, Hardarson MT, Eggertsson HP, Gunnarsson B, Oddsson A, Halldorsson GH, Zink F, Gudjonsson SA, Frigge ML, Thorleifsson G, Sigurdsson A, Stacey SN, Sulem P, Masson G, Helgason A, Gudbjartsson DF, Thorsteinsdottir U, Stefansson K (2019) Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363:eaau1043
Hassan S, Surakka I, Taskinen MR, Salomaa V, Palotie A, Wessman M, Tukiainen T, Pirinen M, Palta P, Ripatti S (2021) High-resolution population-specific recombination rates and their effect on phasing and genotype imputation. Eur J Hum Genet 29:615–624
Hernandez RD, Kelley JL, Elyashiv E, Melton SC, Auton A, McVean G, Sella G, Przeworski M, Project G (2011) Classic selective sweeps were rare in recent human evolution. Science 331:920–924
Hill WG, Robertson A (1968) Linkage disequilibrium in finite populations. Theor Appl Genet 38:226–231
Hothorn T, Buehlmann P, Kneib T, Schmid M, Hofner B (2018) mboost: Model-Based Boosting, R package version 2.9–1, https://CRAN.R-project.org/package=mboost.
Hu WJ, Hao ZQ, Du PY, Di Vincenzo F, Manzi G, Pan YH, Li H (2021) Genomic inference of a human super bottleneck in Mid-Pleistocene transition. bioRxiv. https://doi.org/10.1101/2021.05.16.444351
Hudson RR (2001) Two-locus sampling distributions and their application. Genetics 159:1805–1817
Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18:337–338
Hussin JG, Hodgkinson A, Idaghdour Y, Grenier JC, Goulet JP, Gbeha E, Hip-Ki E, Awadalla P (2015) Recombination affects accumulation of damaging and disease-associated mutations in human populations. Nat Genet 47:400–404
Jeffreys AJ, Kauppi L, Neumann R (2001) Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet 29:217–222
Kamm JA, Spence JP, Chan J, Song YS (2016) Two-locus likelihoods under variable population size and fine-scale recombination rate estimation. Genetics 203:1381–1399
Keinan A, Reich D (2010) Human population differentiation is strongly correlated with local recombination rate. PLoS Genet 6:e1000886
Kingman JFC (1982) On the genealogy of large populations. J Appl Probab 19:27–43
Kong A, Thorleifsson G, Gudbjartsson DF, Masson G, Sigurdsson A, Jonasdottir A, Walters GB, Jonasdottir A, Gylfason A, Kristinsson KT, Gudjonsson SA, Frigge ML, Helgason A, Thorsteinsdottir U, Stefansson K (2010) Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467:1099–1103
Kong A, Thorleifsson G, Frigge ML, Masson G, Gudbjartsson DF, Villemoes R, Magnusdottir E, Olafsdottir SB, Thorsteinsdottir U, Stefansson K (2014) Common and low-frequency variants associated with genome-wide recombination rate. Nat Genet 46:11–16
Li H, Stephan W (2005) Maximum-likelihood methods for detecting recent positive selection and localizing the selected site in the genome. Genetics 171:377–384
Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165:2213–2233
Lin K, Li H, Schlotterer C, Futschik A (2011) Distinguishing positive selection from neutral evolution: Boosting the performance of summary statistics. Genetics 187:229–244
Lin K, Futschik A, Li H (2013) A fast estimate for the population recombination rate based on regression. Genetics 194:473–484
McVean G, Awadalla P, Fearnhead P (2002) A coalescent-based method for detecting and estimating recombination from gene sequences. Genetics 160:1231–1241
McVean GAT, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P (2004) The fine-scale structure of recombination rate variation in the human genome. Science 304:581–584
Miretti MM, Walsh EC, Ke XY, Delgado M, Griffiths M, Hunt S, Morrison J, Whittaker P, Lander ES, Cardon LR, Bentley DR, Rioux JD, Beck S, Deloukas P (2005) A high-resolution linkage-disequilibrium map of the human major histocompatibility complex and first generation of tag single-nucleotide polymorphisms. Am J Hum Genet 76:634–646
Myers S, Bottolo L, Freeman C, McVean G, Donnelly P (2005) A fine-scale map of recombination rates and hotspots across the human genome. Science 310:321–324
O’Connell J, Gurdasani D, Delaneau O, Pirastu N, Ulivi S, Cocca M, Traglia M, Huang J, Huffman JE, Rudan I, McQuillan R, Fraser RM, Campbell H, Polasek O, Asiki G, Ekoru K, Hayward C, Wright AF, Vitart V, Navarro P, Zagury JF, Wilson JF, Toniolo D, Gasparini P, Soranzo N, Sandhu MS, Marchini J (2014) A general approach for haplotype phasing across the full spectrum of relatedness. PLoS Genet 10:e1004234
Ohta T, Kimura M (1971) Linkage disequilibrium between two segregating nucleotide sites under the steady flux of mutations in a finite population. Genetics 68:571–580
Pavlidis P, Jensen JD, Stephan W (2010) Searching for footprints of positive selection in whole-genome SNP data from nonequilibrium populations. Genetics 185:907–922
Payseur BA, Rieseberg LH (2016) A genomic perspective on hybridization and speciation. Mol Ecol 25:2337–2360
Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N, Ruczinski I, Beaty TH, Mathias R, Reich D, Myers S (2009) Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet 5:e1000519
R Core Team (2019) R: A language and environment for statistical computing.
Sall T, Nilsson NO (1994) The robustness of recombination frequency estimates in intercrosses with dominant markers. Genetics 137:589–596
Schiffels S, Durbin R (2014) Inferring human population size and separation history from multiple genome sequences. Nat Genet 46:919–925
Schrider DR, Kern AD (2018) Supervised machine learning for population genetics: a new paradigm. Trends Genet 34:301–312
Schumer M, Xu CL, Powell DL, Durvasula A, Skov L, Holland C, Blazier JC, Sankararaman S, Andolfatto P, Rosenthal GG, Przeworski M (2018) Natural selection interacts with recombination to shape the evolution of hybrid genomes. Science 360:656–659
Spence JP, Song YS (2019) Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations. Sci Adv 5:eaaw9206
Stapley J, Feulner PGD, Johnston SE, Santure AW, Smadja CM (2017) Recombination: the good, the bad and the variable. Philos Trans R Soc Lond B Biol Sci 372:20170279
Stevison LS, Woerner AE, Kidd JM, Kelley JL, Veeramah KR, McManus KF, Bustamante CD, Hammer MF, Wall JD (2016) The time scale of recombination rate evolution in Great Apes. Mol Biol Evol 33:928–945
Terhorst J, Kamm JA, Song YS (2017) Robust and scalable inference of population history from hundreds of unphased whole genomes. Nat Genet 49:303–309
van Eeden G, Uren C, Moller M, Henn BM (2021) Inferring recombination patterns in African populations. Hum Mol Genet 30:R11–R16
Wall JD (2000) A comparison of estimators of the population recombination rate. Mol Biol Evol 17:156–163
Walter K, Min JL, Huang J, Crooks L, Memari Y, McCarthy S, Perry JRB, Xu C, Futema M, Lawson D, Iotchkova V, Schiffels S, Hendricks AE, Danecek P, Li R, Floyd J, Wain LV, Barroso I, Humphries SE, Hurles ME, Zeggini E, Barrett JC, Plagnol V, Richards JB, Greenwood CMT, Timpson NJ, Durbin R, Soranzo N, Bala S, Clapham P, Coates G, Cox T, Daly A, Danecek P, Du Y, Durbin R, Edkins S, Ellis P, Flicek P, Guo X, Guo X, Huang L, Jackson DK, Joyce C, Keane T, Kolb-Kokocinski A, Langford C, Li Y, Liang J, Lin H, Liu R, Maslen J, McCarthy S, Muddyman D, Quail MA, Stalker J, Sun J, Tian J, Wang G, Wang J, Wang Y, Wong K, Zhang P, Barroso I, Birney E, Boustred C, Chen L, Clement G, Cocca M, Danecek P, Smith GD, Day INM, Day-Williams A, Down T, Dunham I, Durbin R, Evans DM, Gaunt TR, Geihs M, Greenwood CMT, Hart D, Hendricks AE, Howie B, Huang J, Hubbard T, Hysi P, Iotchkova V, Jamshidi Y, Karczewski KJ, Kemp JP, Lachance G, Lawson D, Lek M, Lopes M, MacArthur DG, Marchini J, Mangino M, Mathieson I, McCarthy S, Memari Y et al (2015) The UK10K project identifies rare variants in health and disease. Nature 526:82–89
Wang GD, Larson G, Kidd JM, vonHoldt BM, Ostrander EA, Zhang YP (2019) Dog10K: the international consortium of canine genome sequencing. Natl Sci Rev 6:611–613
Webb AJ, Berg IL, Jeffreys A (2008) Sperm cross-over activity in regions of the human genome showing extreme breakdown of marker association. Proc Natl Acad Sci USA 105:10471–10476
Wegmann D, Kessner DE, Veeramah KR, Mathias RA, Nicolae DL, Yanek LR, Sun YV, Torgerson DG, Rafaels N, Mosley T, Becker LC, Ruczinski I, Beaty TH, Kardia SLR, Meyers DA, Barnes KC, Becker DM, Freimer NB, Novembre J (2011) Recombination rates in admixed individuals identified by ancestry-based inference. Nat Genet 43:847–853
Weiss KM, Clark AG (2002) Linkage disequilibrium and the mapping of complex human traits. Trends Genet 18:19–24
Wirtz J, Wiehe T (2019) The evolving Moran genealogy. Theor Popul Biol 130:94–105
Wu RG, Li HX, Peng D, Li R, Zhang YM, Hao B, Huang EW, Zheng CH, Sun HY (2019) Revisiting the potential power of human leukocyte antigen (HLA) genes on relationship testing by massively parallel sequencing-based HLA typing in an extended family. J Hum Genet 64:29–38
Yu DL, Dong LL, Yan FQ, Mu HL, Tang BX, Yang X, Zeng T, Zhou Q, Gao F, Wang ZH, Hao ZQ, Kang HE, Zheng Y, Huang HW, Wei YZ, Pan W, Xu YC, Zhu JW, Zhao SL, Wang CR, Wang PY, Dai L, Li MS, Lan L, Wang YW, Chen H, Li YX, Fu YX, Shao Z, Bao YM, Zhao FQ, Chen LN, Zhang GQ, Zhao WM, Li HP (2019) eGPS 1.0: Comprehensive software for multi-omic and evolutionary analyses. Natl Sci Rev 6:867–869
Acknowledgements
We thank the UK10K Project Consortium for sharing the data.
Funding
This work was supported by grants from the National Natural Science Foundation of China (nos. 31100273, 31172073, 91131010), the Strategic Priority Research Program of the Chinese Academy of Sciences (No. XDB38030100), the National Key Research and Development Project (No. 2020YFC0847000) and the funding from Shanghai Institute of Nutrition and Health (No. JBGSRWBD-SINH-2021-10).
Author information
Authors and Affiliations
Contributions
ZH, PD, YHP, and HL conceived and designed the research; ZH and HL wrote the code; ZH and PD analyzed the data; ZH, PD, YHP, and HL wrote the paper.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Availability of data and material
The datasets used in this study are available at the UK10K Project Consortium (Walter et al. 2015) (https://www.uk10k.org/) The genetic maps of OMNI data set built by LDhat (Auton and McVean 2007) were downloaded from the 1000 Genomes Project (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/20130507_omni_recombination_rates).
Code availability
FastEPRR 2.0 is written in R and integrated on the eGPS cloud (Yu et al. 2019) (http://www.egps-software.net). The desktop version and the genetic maps established in this study are freely available on the institute website (https://www.picb.ac.cn/evolgen/).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Hao, Z., Du, P., Pan, YH. et al. Fine human genetic map based on UK10K data set. Hum Genet 141, 273–281 (2022). https://doi.org/10.1007/s00439-021-02415-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-021-02415-8