The genetic trait of lactase persistence (LP) is associated with at least five independent functional single nucleotide variants in a regulatory region about 14 kb upstream of the lactase gene [−13910*T (rs4988235), −13907*G (rs41525747), −13915*G (rs41380347), −14009*G (rs869051967) and −14010*C (rs145946881)]. These alleles have been inferred to have spread recently and present-day frequencies have been attributed to positive selection for the ability of adult humans to digest lactose without risk of symptoms of lactose intolerance. One of the inferential approaches used to estimate the level of past selection has been to determine the extent of haplotype homozygosity (EHH) of the sequence surrounding the SNP of interest. We report here new data on the frequencies of the known LP alleles in the ‘Old World’ and their haplotype lineages. We examine and confirm EHH of each of the LP alleles in relation to their distinct lineages, but also show marked EHH for one of the older haplotypes that does not carry any of the five LP alleles. The region of EHH of this (B) haplotype exactly coincides with a region of suppressed recombination that is detectable in families as well as in population data, and the results show how such suppression may have exaggerated haplotype-based measures of past selection.
There is now good functional evidence that the genetic trait of persistence of intestinal lactase activity into adult life can be caused by five or more independent single nucleotide variants in a regulatory region (a transcriptional enhancer) upstream of the lactase gene LCT (Fang et al. 2012; Ingram et al. 2007; Jensen et al. 2011; Liebert et al. 2016; Olds and Sibley 2003; Troelsen et al. 2003). One of these, −13910*T (rs4988235) (Enattah et al. 2002) has almost reached fixation in some parts of Europe, while others such as −13907*G (rs41525747), −13915*G (rs41380347), −14009*G (rs869051967) and −14010*C (rs145946881) are found at variable frequencies in the Middle East and Africa (Enattah et al. 2008; Ingram et al. 2007, 2009; Tishkoff et al. 2007). Present-day frequencies of these alleles have been attributed to positive selection for lactase persistence, which allows the dietary consumption of animal milk by adult humans without risk of symptoms of lactose intolerance (Allentoft et al. 2015; Bersaglieri et al. 2004; Gallego Romero et al. 2012; Gerbault et al. 2009; Itan et al. 2009; Mathieson et al. 2015; Schlebusch et al. 2013; Sverrisdottir et al. 2014; Tishkoff et al. 2007). The distributions of these alleles have also clearly been influenced by other processes including population expansion, migration and allele surfing, and cultural/environmental processes (reviewed in Gerbault et al. 2011).
Ancient DNA data have the potential to address the question of where and when an allele first occurred at significant frequency. Increased availability of such data for the European lactase persistence (LP) associated allele, −13910*T has allowed some degree of geo-temporal mapping of its distribution, although so far there is insufficient data to track its geographic origin. The earliest occurrences have been reported in Spain, dated to about 5000 years BP (Plantinga et al. 2012), though having been obtained through PCR-based technology, the possibility of contamination cannot be ruled out. Using NGS sequencing, the earliest detections of the allele were in Germany and Sweden about 4000 years BP (Allentoft et al. 2015; Haak et al. 2015; Mathieson et al. 2015) highlighting its very recent expansion (see Supplementary data for full list of references). There are no reports yet of the other alleles in ancient samples, but genetic evidence points to rather recent spread for all five functional variants (Enattah et al. 2008; Jones et al. 2013; Priehodova et al. 2017; Schlebusch et al. 2013; Tishkoff et al. 2007). Estimation of the age of expansion of the European allele using population genetic and modelling approaches places it during the Neolithic period and suggests selection coefficients ranging from 0.8 to 19% (Bersaglieri et al. 2004; Gerbault et al. 2009; Itan et al. 2009). Such coefficients are extraordinarily high in view of the fact that the cultural adaptation of fermentation of milk products, which reduces the lactose concentration, allows milk to be used as a source of calories in the diet of lactase non-persistent people, circumventing its adverse effects (Segurel and Bon 2017).
One inferential approach frequently used to identify signatures of selection is to determine the extended haplotype homozygosity (EHH) of the sequence surrounding a variant of interest (Bersaglieri et al. 2004; Sabeti et al. 2002; Tishkoff et al. 2007). This method is relatively straightforward when only one functional allele is present at appreciable frequency. Such is the case for LP-associated alleles in Europe and Tanzania, i.e. −13910*T (rs4988235) and −14010*C (rs145946881), respectively. However, the occurrence of several different putative selected alleles in the same sample, as is the case in Ethiopia (Jones et al. 2013, 2015), can complicate interpretation, since these alleles can each be associated with different extended haplotypes. Furthermore, Gallego Romero and colleagues (Gallego Romero et al. 2012) recently reported a common extended haplotype that is not associated with LP.
In this study, we evaluate ‘Old World’ allele frequencies of the known functional LP alleles, as well as other alleles within the LCT enhancer region, adding extensive new data, examine the haplotype backgrounds of each variant and compare extended haplotype homozygosity with those of the corresponding ancestral haplotypes. We also investigate in detail the level of recombination in the chromosomal region, using the HapMap and Icelandic populations.
Materials and methods
Samples newly tested for this paper (2056 individuals from 52 populations) included groups collected under the auspices of ethical committee approvals UCLH 99/0196 and 01/0236. DNA was extracted from buccal samples by various adaptations of the phenol chloroform method. Individual samples were grouped according to the country in which they were collected, and the continental geographic region in which the country is located, namely Northwest/Central Europe, South Europe, East/Southeast Europe, the Middle East, West Asia, Central/South Asia and East/Southeast Asia (labelled Europe-N, Europe-S, Europe-E, M-East, Asia-W, Asia-S and Asia-E, respectively, in Table 1, and see Supplementary Table 2a for groupings). A further categorization was made into distinct cultural groups with a minimum sample size of 10 individuals, using self-declared cultural identity/ethnic background, if such information was available, or geographic subgroups within countries where there was more precise information about the sample localization.
LCT enhancer sequences from all 2056 DNA samples were obtained from a 706 bp fragment in intron 13 of MCM6, PCR amplified as described previously (Ingram et al. 2009; Jones et al. 2013). Supplementary Table 1 shows the primers, locations and cycling conditions. All fragments were sequenced in both directions using a modified version of the Sanger Method and run on an ABI 3730xl DNA Analyzer (Applied Biosystems).
80 kb Haplotype background of enhancer variants
For a subset of 880 samples that included 354 of the newly typed samples as well as European, Middle Eastern and African samples, from populations previously analysed by our group (Ingram et al. 2007, 2009; Jones et al. 2013) additional sequencing and genotyping were performed to obtain data to deduce the 80 kb haplotype background of the enhancer variants (see Supplementary Fig. 1 for all variants). Sequences were obtained from two regions flanking the LCT enhancer, a 683 bp haplotype-defining region upstream of LCT (Hollox et al. 2001) and a 701 bp region in Intron 4 of MCM6 (Jones et al. 2013). The LCT gene region haplotype markers in exon 2 (666 G>A) and exon 17 of LCT, (5579 T>C) were genotyped by LGC Genomics, Teddington, Middlesex,UK) using Kompetitive Allele Specific PCR (KASP) technology (http://www.lgcgroup.com/products/kasp-genotyping-chemistry/#.WbAZy62ZM5g).
PHASE v. 2.1.1 (Stephens et al. 2001; Stephens and Donnelly 2003) was used to infer haplotypes for a final data set of 855 individuals (see Supplementary Table 6). Samples with more than 10% missing data as well as positions with alleles occurring only once were excluded from PHASE analysis.
The software, Network (version 22.214.171.124, http://www.fluxus-engineering.com) was used to construct a haplotype network for this 80 kb genetic region.
Linkage disequilibrium unit (LDU) and genetic (cM) maps
LDU maps (Maniatis et al. 2007) were constructed using data from all the populations of the HapMap Project release #28 (International HapMap3 Consortium 2010). The sex-averaged family cM map based on linkage data from the large Icelandic families was taken from Kong et al. (2010) (sex-averaged.rmap, https://www.decode.com/addendum/).
Extended haplotype homozygosity (EHH)
In addition to the two LCT haplotype markers, a further 34 loci flanking the enhancer were selected for KASP genotyping by LGC Genomics (details above) to extend the haplotype analysis to 1.77 Mb surrounding LCT. These SNPs were selected with the aim of distributing them at an average distance of 50 kb apart. This distribution was adjusted to take into account the LDU maps from the Hapmap populations, and in regions of high LD the markers were spread out, while in regions of lower LD they were placed slightly closer. The full set of SNPs is shown in Supplementary Table 4 with their physical positions along the chromosome.
Haplotypes were determined (PHASE v. 2.1.1) for a final set of 837 individuals (of the 855 above) (Supplementary Table 2b) with nearly complete data (samples with > 10% missing were excluded). The full set of SNPs spread across the 1.77 Mb region was used to measure EHH using the Selscan v1.1.0b package (Szpiech and Hernandez 2014), for each major population group (Europe, Africa, Asia and Middle East) and using each of the SNPs under test as core. SNPs with minor allele frequency < 0.05 were not included in the analysis. The integrated haplotype scores (iHS) were also determined using the Selscan v1.1.0b package. We used the physical map as a proxy for the genetic map because when the genetic distance is zero over several SNPs, the iHS algorithm fails to return results for all SNPs.
Sequencing revealed a total of 22 derived alleles within the LCT enhancer region. The ancestral state of these SNPs was determined by sequence comparison with other primate species, and was in each case the same as the common allele in humans. Of the 22 derived alleles, 10 occurred more than once. Table 1 shows their allele frequencies in each major geographic area, apart from Africa, which we have reported previously (Jones et al. 2015). The new data in Table 1 includes three of the five established functional variants (−13910*T, rs4988235; −13907*G, rs41525747 and −13915*G, rs41380347), as well as −14011*T, (rs4988233) and −13779*C (rs527991977) for which there is more limited evidence of function (Liebert et al. 2016). The other two established African functional alleles (−14009*G, rs869051967 and 14010*C, rs145946881) were only found as singletons. Our own previously reported African data for this genomic region, as well as data reported in the literature by others, were combined with the new data (Supplementary Table 3, in which references are given) and used to examine the geographic distribution of the five most well established functional variants (Supplementary Fig. 2), and to show the distribution of −13910*T in Europe comparing modern and ancient data (Supplementary Fig. 3).
80 kb haplotypes were determined using PHASE. The numbered haplotypes were also assigned to the previously reported LCT gene region haplotypes using the five LCT gene region haplotype-defining-SNPs, i.e. −958C>T, −943/2 TC>Del, −678G>A, 666G>A and 5579T>C (Hollox et al. 2001). Supplementary Tables 5 and 6 show the results of the PHASE analysis, and the haplotype backgrounds of the derived alleles for each of the enhancer variants. In agreement with previous studies (Bersaglieri et al. 2004; Coelho et al. 2005; Poulter et al. 2003) nearly all the −13910*T alleles were found to be on the same 80 kb haplotype (24) associated with an LCT gene region A haplotype. Just one −13910*T allele was on a different haplotype in a single UK individual most likely due to a recombination event between (678 A>G) and LCT exon 2 (666 G>A), since the haplotype is the same as haplotype 24 up to position − 678. Myles and colleagues (Myles et al. 2005) found 8 similar cases in Moroccan and Algerian Berber populations. Except for three alleles, −13907*G is located on haplotype 21 which is also associated with an A haplotype background. −13603*T (haplotype 15) and most of the −14011*T variants are also associated with the LCT haplotype A. However, two −14011*T alleles were found associated with different B haplotype backgrounds, and if the assignments are correct that might suggest this mutation happened more than once independently, probably in geographically distinct places.
The derived allele −13495*T (rs4954490) located just outside the enhancer region, also occurs as a derived allele on the ancestral A haplotype (Supplementary Table 5) and is associated in almost all cases with 13910*T and −13907*G, as well as −14011*T, and −13603*T indicating that −13495*T (rs4954490) predates these enhancer region alleles.
With the combination of loci used in this study, it was possible to distinguish between B and P haplotypes and confirm that −14010*C lies on a haplotype 82 background, associated exclusively with the P haplotype (Jones et al. 2013). Also, in agreement with previous studies (Ingram et al. 2009; Jones et al. 2013), the vast majority of −13915*G alleles are located on a C-associated haplotype background (haplotype 56) and −14009*G mostly on haplotype 26, associated with the LCT haplotype X, but just one, with an ancestral H, which is also the background of the majority of the −13730*G variants (haplotype 17). The variants −13806*G and −13779*C exclusively occur on C-associated haplotype backgrounds (haplotypes 53 and 52, respectively). −13913*C resides on the B haplotype-associated haplotype 80.
A haplotype network was constructed for which most branches reflect unique stepwise mutational events, with relatively few recombination events (shown as ovals) needing to be inferred (Fig. 1a). The network illustrates the distant relationships of the five functional variants. The geographic and simplified ethno-linguistic group distribution of these haplotypes is illustrated on a map in Fig. 1b and shown in more detail in Supplementary Table 6.
Extended haplotype homozygosity (EHH)
EHH analysis was conducted for the four major population groups using each of the functional SNPs as core. All five derived alleles show evidence of EHH. To separate out the effects of the five variants, the chromosomes were separated by major LCT haplotype (A, B, C, P, U, X) and the EHH of the derived and ancestral alleles of the same LCT background haplotype compared (Fig. 2a). All five functional derived alleles have markedly more extended EHH than the corresponding ancestral haplotypes. The pattern of haplotype decay of the derived alleles is similar in all cases; decaying sharply between 3 and 5 thousand kilobases (kb) on the right of the core SNP and being very much more extended on the left. The LCT haplotype B shows an EHH pattern somewhat like the LP alleles, and unlike the other ancestral LCT haplotypes (Fig. 2a; Supplementary Fig. 4), does show evidence of EHH. Figure 2a, b show the pattern of EHH on the B haplotype, using the haplotype-defining SNP, rs56064699 (−958*T) as core marker. This haplotype is extended in all four major population groups, although to a lesser extent than the LP-associated allele carrying haplotypes. Strikingly, the decay of EHH occurs at the same position in relation to the markers under test in all four major geographic groups (Fig. 2b).
EHH in relation to linkage disequilibrium maps (LDU) and centiMorgan (cM)
Since there is little expectation that there has been any selection for the derived allele at rs56064699, because none of the common functional variants occur on this haplotype, and the marker alleles for this haplotype are less frequent in lactose digesters than non-digesters in all groups tested (Ingram et al. 2007; Jones et al. 2013; Ranciaro et al. 2014), we have sought other possible explanations for the apparent high EHH of this haplotype, and examined the pattern of linkage disequilibrium in this region. Figure 2c shows the alignment of the LDU maps constructed for each of the Hapmap populations (acronyms next to the right axis). There is a clear extended region of LD in all populations irrespective of whether LP alleles are present (black dashed lines) or not (grey lines). Since measures of LD not only capture historic recombination, but also are affected by factors such as selection and demography, we sought to examine recombination in family data which only captures recombination events. Notably, the fine-scale lcelandic (deCODE) cM map (red line) coincides with the LDU maps, confirming that the LD pattern is an effect of recombination. Regrettably, there are no suitable data available to construct family cM maps in non-Europeans, but the deCODE map has surprisingly good marker coverage in the region, which means that recombinations can be determined quite accurately, despite the high frequency of LP chromosomes (~ 80%), which decreases the diversity in this population. Moreover (Fig. 2b, c) there is a near correspondence between the extent of the conserved (flat) strong LD/non-recombining region and the most frequent extended B haplotype (grey shaded area).
This work provides a comprehensive view of the Old World distribution of known LP-associated alleles; an updated database can be found in Supplementary Table 3. We observe clear geographic distribution differences for each of the derived enhancer region alleles, even though some of them co-occur in East Africa. Although it might be tempting to speculate that the regions of highest frequency are the regions where the alleles originated, simulation modelling (Edmonds et al. 2004; Itan et al. 2009; Klopfstein et al. 2006) has shown that demographic and selection processes can displace spatial allele frequency distributions away from their origin location.
Analysis of the 80 kb haplotype covering the region of LCT and the upstream enhancer confirmed a tight association of the LP variants with particular haplotypes, as described previously (Enattah et al. 2008; Ingram et al. 2007, 2009; Tishkoff et al. 2007) and shows that haplotype diversity differs between populations, with the least diversity observed in Northern Europe. With the extension of the haplotype analysis to about 1.8 Mb, it was possible to consider further the putative signatures of selection for the derived alleles associated with LP. EHH analysis shows the haplotypes carrying the derived LP-associated alleles are much longer than their ancestral counterparts, supporting the recent origin of these variants (Sabeti et al. 2002, 2006). Even though the close proximity of the functional alleles does not allow iHS to be measured separately, iHS patterns for the region are consistent with selection in all groups tested (Supplementary Fig. 5).
The LCT/MCM6 chromosomal region of Europeans had been reported to show one of the strongest ‘signatures’ of selection genome wide (Bersaglieri et al. 2004; Sabeti et al. 2002), namely marked EHH of the derived allele relative to the ancestral allele at rs4988235. While strong selection for LP has been supported by various studies (Aoki 1986; Coelho et al. 2005; Gerbault et al. 2009; Holden and Mace 1997; Itan et al. 2009; Mathieson et al. 2015; Schlebusch et al. 2013; Sverrisdottir et al. 2014), the features of the chromosomal region highlighted here show that other processes, such as recombination, may have influenced the patterns observed.
In particular, we not only confirm the high frequency and wide distribution of the B haplotype in this large data set, as also shown in previous studies (Gallego Romero et al. 2012; Hollox et al. 2001; Ingram et al. 2007; Jones et al. 2013), but also further highlight the notable EHH of the B haplotype. The B haplotype does not carry any known functionally important enhancer alleles subject to positive selection, and is likely to be old, given its widespread geographic distribution; its extended haplotype homozygosity, therefore, requires explanation. We show that the region of EHH overlaps exactly with the region of very little recombination and high LD for all populations, including ones in which no, or very few, LP-associated alleles occur, and consequently cannot have been affected by positive selection for LP. The long gene in the centre of this region of high LD, ZRANB3 (zinc finger, RNA-binding domain containing 3, a DNA annealing helicase and endonuclease with function for genome stability (UniProtKB, http://www.uniprot.org/), which is important for replication stress response (Weston et al. 2012), is much more likely to have been subject to purifying rather than positive selection.
This lack of recombination inferred from measures of LD in samples from unrelated individuals was confirmed by a corresponding lack of recombination events in the large Icelandic families. Ongoing reduced recombination or suppression of recombination might be attributable to lack of clusters of appropriate sequence motifs required for recombination (Myers et al. 2010) or a structural rearrangement of the chromosome, such as an inversion in this region, in one or more of the haplotypes. This non-recombining block most likely explains the asymmetry of the extended haplotypes carrying the functional SNPs, i.e. the haplotypes extend further downstream of the LCT gene even though the functional SNPs (under selection) are located within MCM6 upstream of LCT. This asymmetry can be seen but was not commented on in previous work (Bersaglieri et al. 2004). One could also speculate that a chromosomal rearrangement(s) might have assisted in driving the causative alleles to higher frequency, by transmission distortion similar to that found in other studies (Didion et al. 2016; Odenthal-Hesse et al. 2014). This might contribute to a more rapid increase in frequency and help to explain why the effect of selection seems so high for a phenotype whose selective advantage(s) are still somewhat elusive and environmentally variable (reviewed in Segurel and Bon 2017).
More broadly, our results indicate that regions of the genome in which there has been restricted recombination and where there are relatively few common haplotypes world-wide can give inflated EHH and iHS results, and thus possibly misleading interpretations as to the real extent of selection, when using haplotype-based measures.
Allentoft ME, Sikora M, Sjogren KG, Rasmussen S, Rasmussen M, Stenderup J et al (2015) Population genomics of Bronze Age Eurasia. Nature 522:167–172. doi:10.1038/nature14507
Aoki K (1986) A stochastic model of gene-culture coevolution suggested by the “culture historical hypothesis” for the evolution of adult lactose absorption in humans. Proc Natl Acad Sci USA 83:2929–2933
Bersaglieri T, Sabeti PC, Patterson N, Vanderploeg T, Schaffner SF, Drake JA et al (2004) Genetic signatures of strong recent positive selection at the lactase gene. Am J Hum Genet 74:1111–1120. doi:10.1086/421051
Coelho M, Luiselli D, Bertorelle G, Lopes AI, Seixas S, Destro-Bisol G et al (2005) Microsatellite variation and evolution of human lactase persistence. Hum Genet 117:329–339. doi:10.1007/s00439-005-1322-z
Didion JP, Morgan AP, Yadgary L, Bell TA, McMullan RC, Ortiz de Solorzano L et al (2016) R2d2 drives selfish sweeps in the house mouse. Mol Biol Evol 33:1381–1395. doi:10.1093/molbev/msw036
Edmonds CA, Lillie AS, Cavalli-Sforza LL (2004) Mutations arising in the wave front of an expanding population. Proc Natl Acad Sci USA 101:975–979. doi:10.1073/pnas.0308064100
Enattah NS, Sahi T, Savilahti E, Terwilliger JD, Peltonen L, Jarvela I (2002) Identification of a variant associated with adult-type hypolactasia. Nat Genet 30:233–237. doi:10.1038/ng826
Enattah NS, Jensen TG, Nielsen M, Lewinski R, Kuokkanen M, Rasinpera H et al (2008) Independent introduction of two lactase-persistence alleles into human populations reflects different history of adaptation to milk culture. Am J Hum Genet 82:57–72. doi:10.1016/j.ajhg.2007.09.012
Fang L, Ahn JK, Wodziak D, Sibley E (2012) The human lactase persistence-associated SNP -13910*T enables in vivo functional persistence of lactase promoter-reporter transgene expression. Hum Genet 131:1153–1159. doi:10.1007/s00439-012-1140-z
Gallego Romero I, Basu Mallick C, Liebert A, Crivellaro F, Chaubey G, Itan Y et al (2012) Herders of Indian and European cattle share their predominant allele for lactase persistence. Mol Biol Evol 29:249–260. doi:10.1093/molbev/msr190
Gerbault P, Moret C, Currat M, Sanchez-Mazas A (2009) Impact of selection and demography on the diffusion of lactase persistence. PLoS One 4:e6369. doi:10.1371/journal.pone.0006369
Gerbault P, Liebert A, Itan Y, Powell A, Currat M, Burger J et al (2011) Evolution of lactase persistence: an example of human niche construction. Philos Trans R Soc Lond B Biol Sci 366:863–877. doi:10.1098/rstb.2010.0268
Haak W, Lazaridis I, Patterson N, Rohland N, Mallick S, Llamas B et al (2015) Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522:207–211. doi:10.1038/nature14317
Holden C, Mace R (1997) Phylogenetic analysis of the evolution of lactose digestion in adults. Hum Biol 69:605–628
Hollox EJ, Poulter M, Zvarik M, Ferak V, Krause A, Jenkins T et al (2001) Lactase haplotype diversity in the Old World. Am J Hum Genet 68:160–172. doi:10.1086/316924
Ingram CJ, Elamin MF, Mulcare CA, Weale ME, Tarekegn A, Raga TO et al (2007) A novel polymorphism associated with lactose tolerance in Africa: multiple causes for lactase persistence? Hum Genet 120:779–788. doi:10.1007/s00439-006-0291-1
Ingram CJ, Raga TO, Tarekegn A, Browning SL, Elamin MF, Bekele E et al (2009) Multiple rare variants as a cause of a common phenotype: several different lactase persistence associated alleles in a single ethnic group. J Mol Evol 69:579–588. doi:10.1007/s00239-009-9301-y
Itan Y, Powell A, Beaumont MA, Burger J, Thomas MG (2009) The origins of lactase persistence in Europe. PLoS Comput Biol 5:e1000491. doi:10.1371/journal.pcbi.1000491
Jensen TG, Liebert A, Lewinsky R, Swallow DM, Olsen J, Troelsen JT (2011) The -14010*C variant associated with lactase persistence is located between an Oct-1 and HNF1alpha binding site and increases lactase promoter activity. Hum Genet 130:483–493. doi:10.1007/s00439-011-0966-0
Jones BL, Raga TO, Liebert A, Zmarz P, Bekele E, Danielsen ET et al (2013) Diversity of lactase persistence alleles in Ethiopia: signature of a soft selective sweep. Am J Hum Genet 93:538–544. doi:10.1016/j.ajhg.2013.07.008
Jones BL, Oljira T, Liebert A, Zmarz P, Montalva N, Tarekeyn A et al (2015) Diversity of lactase persistence in African milk drinkers. Hum Genet 134:917–925. doi:10.1007/s00439-015-1573-2
Klopfstein S, Currat M, Excoffier L (2006) The fate of mutations surfing on the wave of a range expansion. Mol Biol Evol 23:482–490. doi:10.1093/molbev/msj057
Kong A, Thorleifsson G, Gudbjartsson DF, Masson G, Sigurdsson A, Jonasdottir A et al (2010) Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467:1099–1103. doi:10.1038/nature09525
Liebert A, Jones BL, Danielsen ET, Olsen AK, Swallow DM, Troelsen JT (2016) In vitro functional analyses of infrequent nucleotide variants in the lactase enhancer reveal different molecular routes to increased lactase promoter activity and lactase persistence. Ann Hum Genet 80:307–318. doi:10.1111/ahg.12167
Maniatis N, Collins A, Morton NE (2007) Effects of single SNPs, haplotypes, and whole-genome LD maps on accuracy of association mapping. Genet Epidemiol 31:179–188. doi:10.1002/gepi.20199
Mathieson I, Lazaridis I, Rohland N, Mallick S, Patterson N, Roodenberg SA et al (2015) Genome-wide patterns of selection in 230 ancient Eurasians. Nature 528:499–503. doi:10.1038/nature16152
Myers S, Bowden R, Tumian A, Bontrop RE, Freeman C, MacFie TS et al (2010) Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science 327:876–879. doi:10.1126/science.1182363
Myles S, Bouzekri N, Haverfield E, Cherkaoui M, Dugoujon JM, Ward R (2005) Genetic evidence in support of a shared Eurasian-North African dairying origin. Hum Genet 117:34–42. doi:10.1007/s00439-005-1266-3
Odenthal-Hesse L, Berg IL, Veselis A, Jeffreys AJ, May CA (2014) Transmission distortion affecting human noncrossover but not crossover recombination: a hidden source of meiotic drive. PLoS Genet 10:e1004106. doi:10.1371/journal.pgen.1004106
Olds LC, Sibley E (2003) Lactase persistence DNA variant enhances lactase promoter activity in vitro: functional role as a cis regulatory element. Hum Mol Genet 12:2333–2340. doi:10.1093/hmg/ddg244
Plantinga TS, Alonso S, Izagirre N, Hervella M, Fregel R, van der Meer JW et al (2012) Low prevalence of lactase persistence in Neolithic South-West Europe. Eur J Hum Genet 20:778–782. doi:10.1038/ejhg.2011.254
Poulter M, Hollox E, Harvey CB, Mulcare C, Peuhkuri K, Kajander K et al (2003) The causal element for the lactase persistence/non-persistence polymorphism is located in a 1 Mb region of linkage disequilibrium in Europeans. Ann Hum Genet 67:298–311
Priehodova E, Austerlitz F, Cizkova M, Mokhtar MG, Poloni ES, Cerny V (2017) The historical spread of Arabian Pastoralists to the eastern African Sahel evidenced by the lactase persistence -13,915*G allele and mitochondrial DNA. Am J Hum Biol. doi:10.1002/ajhb.22950
Ranciaro A, Campbell MC, Hirbo JB, Ko WY, Froment A, Anagnostou P et al (2014) Genetic origins of lactase persistence and the spread of pastoralism in Africa. Am J Hum Genet 94:496–510. doi:10.1016/j.ajhg.2014.02.009
Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF et al (2002) Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832–837. doi:10.1038/nature01140
Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, Shamovsky O et al (2006) Positive natural selection in the human lineage. Science 312:1614–1620. doi:10.1126/science.1124309
Schlebusch CM, Sjodin P, Skoglund P, Jakobsson M (2013) Stronger signal of recent selection for lactase persistence in Maasai than in Europeans. Eur J Hum Genet 21:550–553. doi:10.1038/ejhg.2012.199
Segurel L, Bon C (2017) On the evolution of lactase persistence in humans. Annu Rev Genomics Hum Genet. doi:10.1146/annurev-genom-091416-035340
Stephens M, Donnelly P (2003) A comparison of bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73:1162–1169. doi:10.1086/379378
Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978–989. doi:10.1086/319501
Sverrisdottir OO, Timpson A, Toombs J, Lecoeur C, Froguel P, Carretero JM et al (2014) Direct estimates of natural selection in Iberia indicate calcium absorption was not the only driver of lactase persistence in Europe. Mol Biol Evol 31:975–983. doi:10.1093/molbev/msu049
Szpiech ZA, Hernandez RD (2014) Selscan: an efficient multithreaded program to perform EHH-based scans for positive selection. Mol Biol Evol 31:2824–2827. doi:10.1093/molbev/msu211
Tishkoff SA, Reed FA, Ranciaro A, Voight BF, Babbitt CC, Silverman JS et al (2007) Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet 39:31–40. doi:10.1038/ng1946
Troelsen JT, Olsen J, Moller J, Sjostrom H (2003) An upstream polymorphism associated with lactase persistence has increased enhancer activity. Gastroenterology 125:1686–1694
Weston R, Peeters H, Ahel D (2012) ZRANB3 is a structure-specific ATP-dependent endonuclease involved in replication stress response. Genes Dev 26:1558–1572. doi:10.1101/gad.193516.112
We thank Mari Wyn Burley and the UCL Centre for Comparative Genomics for help with sequencing and many other members of GEE for help and advice; and we are very grateful to all sample collectors and sample donors. We thank Iain Mathieson for help with data used for Supplementary Fig. 3. This work was funded by EU Marie Curie ITN FP7 Framework Programme grant, LeCHE, grant ref 215362-2 (AL, PG, MT, DS), Bicentennial Becas–Chile Scholarship for the Advanced Human Capital Program by the Chilean National Commission for Scientific and Technological Research (CONICYT)(NM), and the Annals of Human Genetics (NM, AL), and an MRC-DTA studentship (BL).
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
About this article
Cite this article
Liebert, A., López, S., Jones, B.L. et al. World-wide distributions of lactase persistence alleles and the complex effects of recombination and selection. Hum Genet 136, 1445–1453 (2017). https://doi.org/10.1007/s00439-017-1847-y