Human Genetics

, Volume 124, Issue 6, pp 579–591

Lactose digestion and the evolutionary genetics of lactase persistence


  • Catherine J. E. Ingram
    • Department of Genetics Evolution and EnvironmentUniversity College London
  • Charlotte A. Mulcare
    • Department of Genetics Evolution and EnvironmentUniversity College London
  • Yuval Itan
    • Department of Genetics Evolution and EnvironmentUniversity College London
    • Centre for Mathematics and Physics in the Life Sciences and Experimental Biology, CoMPLEXUniversity College London
  • Mark G. Thomas
    • Department of Genetics Evolution and EnvironmentUniversity College London
    • Department of Genetics Evolution and EnvironmentUniversity College London
Review Article

DOI: 10.1007/s00439-008-0593-6

Cite this article as:
Ingram, C.J.E., Mulcare, C.A., Itan, Y. et al. Hum Genet (2009) 124: 579. doi:10.1007/s00439-008-0593-6


It has been known for some 40 years that lactase production persists into adult life in some people but not in others. However, the mechanism and evolutionary significance of this variation have proved more elusive, and continue to excite the interest of investigators from different disciplines. This genetically determined trait differs in frequency worldwide and is due to cis-acting polymorphism of regulation of lactase gene expression. A single nucleotide polymorphism located 13.9 kb upstream from the lactase gene (C-13910 > T) was proposed to be the cause, and the −13910*T allele, which is widespread in Europe was found to be located on a very extended haplotype of 500 kb or more. The long region of haplotype conservation reflects a recent origin, and this, together with high frequencies, is evidence of positive selection, but also means that −13910*T might be an associated marker, rather than being causal of lactase persistence itself. Doubt about function was increased when it was shown that the original SNP did not account for lactase persistence in most African populations. However, the recent discovery that there are several other SNPs associated with lactase persistence in close proximity (within 100 bp), and that they all reside in a piece of sequence that has enhancer function in vitro, does suggest that they may each be functional, and their occurrence on different haplotype backgrounds shows that several independent mutations led to lactase persistence. Here we provide access to a database of worldwide distributions of lactase persistence and of the C-13910*T allele, as well as reviewing lactase molecular and population genetics and the role of selection in determining present day distributions of the lactase persistence phenotype.


Lactase, the small intestinal enzyme responsible for cleaving lactose into its constituent absorbable monosaccharides, glucose and galactose, is essential for the nourishment of newborn mammals, whose sole source of nutrition is milk, in which lactose is the major carbohydrate component. In adult mammals other than humans lactase production decreases significantly in quantity following weaning (Buller et al. 1990; Lacey et al. 1994; Sebastio et al. 1989). Although individual differences in the ability of human adults to digest milk had been remarked upon in Roman times, variation in expression of lactase was not established as a genetically determined trait until the second half of the twentieth century. Indeed before this, expression of high levels of lactase in adulthood was considered by people of European descent to be the ‘normal’ state of affairs, and widespread deficiency of lactase in adults was only appreciated in the early 1960s (Auricchio et al. 1963; Dahlqvist et al. 1963).

Here, we review all aspects of this polymorphism from description of phenotype to molecular and evolutionary genetics. Since we had noted that the population distribution data available in many literature reviews contained anomalous information (as will be discussed below) we also provide access to a newly constructed database of phenotypic data taken from source publications.

Determination of lactase persistence status

People whose lactase persists at high levels throughout adult life are said to be lactase persistent while those with little lactase as adults are described as lactase non-persistent (also referred to in the literature as primary adult hypolactasia). Since taking intestinal biopsies from healthy people is invasive and not acceptable unless the person is having other investigations, lactase persistence status is often inferred by a method depending on lactose digestion. This allows people to be classified as lactose digesters and maldigesters. This difference in digestion is measured by a test traditionally known as a ‘lactose tolerance test’ and thus the terms tolerant and intolerant are sometimes used, though this can be confused with dietary intolerance.

The lactose tolerance test usually involves giving a lactose load after an overnight fast and then measuring blood glucose or breath hydrogen. A baseline measurement of blood glucose or breath hydrogen is taken before ingestion of the lactose, and then at various time intervals thereafter. An increase in blood glucose indicates lactose digestion (glucose produced from the lactose hydrolysis is absorbed into the bloodstream), and no increase, or a ‘flat line’ is indicative of a lactose maldigester (probable lactase non-persistent) phenotype. An increase in breath hydrogen indicates maldigestion and reflects colonic fermentation of the lactose, as described in the following section. In both cases somewhat arbitrary cut-off points have to be set for distinguishing the two phenotypes and both methods inform upon the person’s ability to digest lactose rather than the given individual’s lactase expression. It must therefore be borne in mind that there will be an underlying error rate, leading to both false negatives and false positives. The relative efficiency of the tests has been examined in more than one study, and the breath hydrogen method was found the most accurate (Mulcare et al. 2004; Newcomer et al 1975; Peuhkuri 2000). It is also convenient and cheap. Lactase levels can, however, be secondarily reduced by gastrointestinal disease, leading to secondary lactose intolerance and also some people fail to produce hydrogen. In the clinical setting there are ways of improving the quality of the test. These include retesting, and giving a dose of a non-digestible carbohydrate, lactulose, to test for the presence of hydrogen producing bacteria (see section below), and investigation of other causes of the lactose intolerance, which might include examination of biopsy material.

Symptoms of lactose intolerance

Undigested lactose passing through the small intestine into the colon has two physiological effects. First, an osmotic gradient is set up across the gut wall, which results in an influx of water, causing symptoms of diarrhoea. Second, the lactose can be fermented by colonic bacteria, to produce fatty acids and gaseous by-products (including hydrogen, used in the tolerance test), potentially causing discomfort, bloating and flatulence. However most lactase non-persistent individuals can tolerate small amounts of lactose (as in tea or coffee), and some can consume a lot without ill effects (Scrimshaw and Murray 1988; Suarez et al. 1997). Variation in the composition of the gut flora between individuals (Hertzler et al. 1997; Hertzler and Savaiano 1996), as well as a psychosomatic component (Briet et al. 1997; Peuhkuri et al. 2000; Saltzman et al. 1999) may account for some of the interindividual variation in symptoms.

Worldwide distribution of lactase persistence

Surveys of lactase persistence phenotype frequencies have been carried out in many populations over the years, so that the global distribution of lactase persistence is now fairly well characterised (Flatz 1987; Swallow and Hollox 2000; Table 1 supplementary information; Fig. 1a). This reveals that lactase non-persistence is the most common phenotype in humans (65% if one takes into account population census size as shown in Table 2 of the supplementary information), with lactase persistence being common only in certain populations with a long history of pastoralism and milking (McCracken 1971; Simoons 1970). Lactase persistence is at highest frequency in north-western Europe, with a decreasing cline to the south and east. On the Indian subcontinent the frequency of lactase persistence is higher in the north-west than elsewhere, and further east than India the lactase persistence frequency is generally low. In Africa, the distribution is patchy, with some pastoralist nomadic tribes having high frequencies of lactase persistence compared with neighbouring groups living in the same country (Bayoumi et al. 1981, 1982), with a similar pattern observed between Bedouin and neighbouring populations in the Middle East (Fig. 2, Cook and al-Torki 1975; Dissanyake et al. 1990; Snook et al. 1976).
Fig. 1

Interpolated maps of the ‘old world’ showing the distribution of (a) lactase persistence data taken from the literature (Supplementary data Table 1), (b) -13910*T distribution (c) lactase persistence frequency predicted from -13910*T distribution, using the data collection to be found in Supplementary data Table 3. Maps were generated using PYNGL ( Only includes individuals over 12 years of age, who are unrelated, and literature for which the original publications have been located and checked. Articles in which there was clear selection bias, and recent immigrant populations are excluded, but the data can be found in Supplementary data Table 1. The Americas are excluded from all maps because of the paucity of data. Most data were obtained from lactose tolerance tests using either breath hydrogen or blood glucose, though in some cases enzyme assay data were available. Locations were either as described precisely in the publication, or taken from capital cities or central points of a country or region where precise location is not mentioned. Where more than one data set was available weighted averages of the data were taken. Predicted frequency taken to be p2 + 2pq, where p is the frequency of −13910*T. Data points are shown as dots. It should be noted that the interpolation is inaccurate where there are few data points. A colour version of this figure can be found in the electronic supplementary information
Fig. 2

Examples of countries/geographic regions in which individual ethnic groups display large differences in lactose absorption capacity. See Supplementary data (Table 1) for details

The noted correlation of lactase persistence phenotype with the cultural practise of milking generated the hypothesis that this trait has been subject to strong positive selection (Aoki 1986; Holden and Mace 1997; McCracken 1971; Simoons 1970, 1978).

Identifying the causes of lactase persistence

By the early 1970s it was established that the lactase persistence polymorphism in humans has a genetic cause, and is inherited in an autosomal dominant manner (Ferguson and Maxwell 1967; Metneki et al. 1984; Sahi 1974). Further evidence that lactase persistence is a genetic trait, and more specifically that it is caused by a cis-acting element, was produced in the early 1980s. Ho et al. reported a trimodal distribution of sucrase:lactase ratios in intestinal samples from British adults of northern European ancestry. The trimodal distribution was interpreted as attributable to groups of individuals homozygous for lactase persistence (highest lactase activity), heterozygotes with mid-level activity and non-persistent homozygotes with low lactase activity (Ho et al. 1982), and similar results were subsequently obtained in individuals of German ancestry (Flatz 1984). The intermediate lactase activity observed in the heterozygotes indicated that only one copy of the lactase gene was being fully expressed. Evidence for transcriptional regulation (Escher et al. 1992) and confirmatory evidence for the cis-acting nature of this (Wang et al. 1995) was obtained from mRNA studies.

Sequencing of LCT and the immediate promoter region in Europeans showed no nucleotide changes that were absolutely associated with persistence/non-persistence (Boll et al. 1991; Lloyd et al. 1992; Poulter et al. 2003). However, several polymorphisms do exist across the 50 kb LCT gene and association studies revealed that very few haplotypes occur in most of the human populations tested, although greater diversity was observed in African populations (Hollox et al. 2001). One combination of alleles designated the ‘A’ haplotype (Fig. 3) is particularly common in northern Europe and is associated with lactase persistence (Harvey et al. 1998). A putative causative single nucleotide polymorphism (C-13910 > T) was subsequently identified 13.9 kb upstream of the LCT transcription initiation site (Enattah et al. 2002) (Fig. 3). It is located in an intron of an adjacent gene, MCM6, and occurs exclusively on the background of the A haplotype (Poulter et al. 2003).
Fig. 3

Diagrammatic representation of the genes MCM6 and LCT. The arrow indicates the location of −13910*T, and the other alleles shown more recently to be associated with lactase persistence. Locations of SNPs used for LCT core haplotype analysis are shown, with the possible allelic combinations of the four common worldwide 11 SNP haplotypes described in Hollox et al. (2001). The open circles indicate an ancestral allele and filled circles denote the derived allele at a locus. SNPs used for assessing haplotype background of the lactase persistence associated variants in our own studies are 4, 6, 9 and 10

The −13910*T allele was found to associate completely with lactase persistence, ascertained directly by enzyme activity in 196 Finnish individuals, and subsequent studies have confirmed a tight but not absolute association between −13910*T and lactase persistence as judged by lactose tolerance testing in populations of northern European ancestry (Bernardes-Silva et al. 2007; Hogenauer et al. 2005; Kerber et al. 2007; Poulter et al. 2003) and there was also a correlation, but not absolute, between genotypes and enzymatic activity (Poulter et al. 2003). However the A haplotype extends far beyond the 50 kb LCT gene region, with carriers of the −13910*T allele having almost identical chromosomes extending for nearly 1 Mb (Bersaglieri et al. 2004; Poulter et al. 2003).

Evidence for function of −13910*T

In vitro studies provided evidence that the −13910*T allele increases transcription in promoter–reporter construct assays in cell lines (Lewinsky et al. 2005; Olds and Sibley 2003; Troelsen et al. 2003), suggesting that it may have enhancer activity in vivo. A transcription factor, Oct-1, was identified which bound more strongly to the −13910*T containing motif than to the alternative C allele, providing a possible mechanism for up-regulation of LCT (Lewinsky et al. 2005), and suggesting that the cause of lactase persistence had been identified (Rasinpera et al. 2004), although many questions remain unanswered.

Population distribution of −13910*T: −13910*T does not account for lactase persistence worldwide and is rare in sub-Saharan African populations

Using carefully checked primary source literature data (Supplementary Table 1) we failed to obtain the tight correlation of −13910*T with published worldwide lactase persistence phenotype frequency reported elsewhere (Enattah et al. 2007), but it is clear that in Europe the frequency distribution of −13910*T is in broad agreement with that expected from distribution of the phenotype (Fig. 1). Figure 1a shows an interpolated contour map depicting the distribution of lactase persistence, prepared from phenotypic data taken from all the available literature, in which we were confident of the phenotypic testing, and from which children, family members, patients selected for likely intolerance, and twentieth/twenty-first century immigrant status were excluded. Figure 1b shows the distribution of −13910*T and details of the worldwide −13910*T data can be found in the supplementary information (Supplementary Table 3). Figure 1c shows predicted lactose tolerance distribution taken from −13910*T frequencies, assuming that −13910*T is the sole cause of lactase persistence and is dominant (p2 + 2pq).

In contrast to the high frequency in Europe, −13910*T is rare in sub-Saharan African populations (Fig. 1b) even in those populations where lactase persistence frequency is reported to be high (Mulcare et al. 2004), and it is also rare in the Bedouins of the Arabian peninsula, who are also frequently lactose digesters (Ingram et al. 2007). The allele was also absent from all but one of a series of phenotyped individuals of Sudanese ancestry (Ingram et al. 2007). An obvious interpretation was that -13910*T is not truly causal of lactase persistence, but is a very strongly associated marker of the causal element, which appeared on the lactase persistence carrying (A haplotype) chromosome after humans had spread out of Africa. However there was also no association with A haplotype in this African group and subsequent research indicated genetic heterogeneity.

New variants in intron 13 of MCM6, and multiple causes of lactase persistence in Africa

Three studies revealed several new sequence variants in very close proximity (Figs. 3, 4; Table 1) to −13910*T (Enattah et al. 2008; Ingram et al. 2007; Tishkoff et al. 2007), two of which are clearly associated with lactase persistence in different parts of East Africa (−13915*G and −14010*C). One of these, −13915*G, was also shown to be associated with high lactase expression in Saudi Arabia (Imtiaz et al. 2007). A third SNP, −13907*G, showed much weaker evidence, but was found in several studies (Enattah et al. 2008; Ingram 2008; Ingram et al. 2007; Tishkoff et al. 2007), and there were several other candidates found in lactase persistent or milk drinking people (Enattah et al. 2008; Ingram et al. 2007; Ingram 2008; Tag et al. 2007; Tishkoff et al. 2007). However, even taking these additional variants into account, and supposing them all to be functional, association with phenotype was not complete. Although the occurrence of a few individuals who carried an allele but were lactose maldigesters could be explained by secondary lactase loss, individuals who were digesters but carried no putative causative allele in this genomic region still had to be explained, indicating that there may be more, as yet unidentified, causal variants. The genomic region may be particularly susceptible to mutations, and these ‘recent’ derived variants might simply be markers of a causal element elsewhere. However, the three newly described SNPs all occur on different haplotype backgrounds from each other (using our old nomenclature: −13907*G, on A, −13915*G, on C, and −14010*C probably on B) (Enattah et al. 2008; Ingram et al. 2007; Ingram 2008; Tishkoff et al. 2007), although −13907*G is on the same haplotype as −13910*T. In each case the haplotypes extend well beyond the ~−14 kb allele in both directions, showing clearly that the derived alleles cannot simply be markers for a single shared causal variant, and that there must be several independent causes of lactase persistence. Each of the alleles has a different geographic distribution, and the preliminary data suggest that -13915*G arose in the Middle East, while −13907*G and −14010*C arose in eastern Africa.
Fig. 4

Sequence of the enhancer region in intron 13 of MCM6 showing the positions of characterised transcription factor binding sites (Lewinsky et al. 2005) and the SNPs that have been shown to associate with lactase persistence. Note that the protein binding region −13926 to −13909 is comprised of two partially overlapping sites (Oct-1 and GATA6 as indicated). Several other SNPs that have been identified by ourselves and others, in this region, including −13913T > C are not shown since, as yet, no evidence of association with phenotype is available

Table 1

Details of SNPs known to be associated with lactase persistence as of July 2008

Position of SNP (in bps upstream of LCT)

Substitution (ancestral allele first, from comparison with chimp)

rs Number

Evidence of association with lactase persistence

Evidence of function

Haplotype (Hollox et al. 2000 nomenclature)

Geographic location of highest observed frequency


G > C

Not included in dbSNP

Tishkoff et al. (2007)

Tishkoff et al. (2007)




T > G


Ingram et al. (2007), Tishkoff et al. (2007), Imtiaz et al. (2007)

Tishkoff et al. (2007), Enattah et al. (2008)


Saudi Arabia


C > T


Enattah et al. (2002)

Troelsen et al. (2003), Olds and Sibley (2003), Lewinsky et al. (2005)




C > G


Tishkoff et al. (2007) Ingram (2008)

Tishkoff et al. (2007), Enattah et al. (2008)



Note that we and others have identified a total of ten other alleles (including −13913*C) within the 130 bp region -13,900 to -14,030 for which studies of their association and function are ongoing

Evidence of function for the alleles identified in Africa

It is important to critically evaluate the evidence for function of these recently described alleles. Footprint analysis, to determine DNA–protein binding sites, of sequence encompassing the intron 13 region revealed transcription factor recognition sequences for Cdx-2, GATA, HNF3α/Fox and HNF4α along with Oct-1 (Lewinsky et al. 2005). Two of the newly identified SNPs are located within the Oct-1 binding site (Fig. 4). Electrophoretic mobility shift assays (EMSAs) used to ascertain the effect of the new alleles on Oct-1 binding showed that only the original allele, −13910*T containing oligonucleotide probes bound strongly to Oct-1, -13907*G bound to a much lesser extent (Enattah et al. 2008; Ingram et al. 2007), and that binding of the other alleles was less still or undetectable. It can therefore be concluded that the simple change in binding of the protein Oct-1 to this site is unlikely to play a critical role in causing lactase persistence. The identification of the other associated allele, −14010*C, (Tishkoff et al. 2007), situated 100 bp away from the predicted Oct-1 binding site would appear to confirm this.

In vitro promoter/reporter analysis of the newly identified MCM6 intron 13 variant alleles however, lends some support to the idea that they do affect enhancer activity. Transcriptional activity of the LCT core promoter was enhanced up to tenfold by addition of sequences from MCM6 intron 13 (Lewinsky et al. 2005; Olds and Sibley 2003; Tishkoff et al. 2007) which include the ancestral variant. This activity increased further (by up to 25% more) when one of the variant alleles (−14010*C, −13907*G or −13915*G) was present (Tishkoff et al. 2007). This effect is in fact small and the authors did not include −13910*T as a positive control (previously shown to enhance transcription activity a further 80% compared to the ancestral allele (Troelsen et al. 2003). Although a recent paper of Enattah et al. (2008) does confirm an effect for −13915*G, the results are hard to evaluate because additional sequences are included in the construct, and the control −13910*T shows very little effect in this study. However, in the Enattah et al. (2008) paper the Caco-2 cells were not differentiated, as they had been in some of the previous studies (Troelsen et al. 2003). This also flags the problem of the appropriateness of the cell model. Caco-2 is a colon cell line, and the only line known to express lactase and has features more comparable with fetal small intestine (Hauri et al. 1985).

The predictive value of these in vitro functional studies with respect to the effect exerted in vivo by particular alleles is therefore uncertain, but the observations, together with those made previously (Lewinsky et al. 2005; Olds and Sibley 2003; Troelsen et al. 2003) do suggest, though do not confirm that this region is important in regulation of LCT expression. But how it allows low expression in fetuses, high expression in babies and then down-regulation in some but not other people is currently hard to envisage. Studies in mice flag the complexities of interpretation of in vitro studies, and indeed in vivo studies highlight the subtleties of tissue and developmental control (Bosse et al. 2006a, b, 2007; van Wering et al. 2004). Unfortunately there are severe restrictions to animal models in elucidating this uniquely human polymorphism.

The role of other factors influencing lactase expression

The immediate promoter of LCT is moderately well characterised in rat, pig and human (Fang et al. 2000, 2001; Krasinski et al. 2001; Lee et al. 2002; Mitchelmore et al. 2000; Spodsberg et al. 1999; Troelsen et al. 1994, 1997; van Wering et al. 2004; Wang et al. 2006), and there are several allelic variants within the first kilobase of human sequence (Harvey et al. 1995; Hollox et al. 1999; Lloyd et al. 1992). Although none of them is causal of persistence, it is just possible that variations in these SNPs affect expression under certain circumstances or at certain developmental stages: one study shows that the allele -958*T (characteristic of the B haplotype) reduces binding to an uncharacterised transcription factor (Hollox et al. 1999). Whilst it has been well established that regulation of LCT is predominantly under genetically determined transcriptional control there is evidence that other factors influence inter-individual differences in expression of the enzyme. Heterogeneity of the lactase non-persistence phenotype was reported by a number of research groups in their early studies. Some investigators observed individuals who show slower/abnormal processing of their lactase protein (Sterchi et al. 1990; Witte et al. 1990) which may imply variation in post-translational controls such as proteolytic cleavage, glycosylation and/or transport to the cell surface, which are involved in the normal processing of lactase (Jacob et al. 1994, 1995, 1996, 2002; Naim and Lentze 1992). Others have made observations suggestive of epigenetic regulation (Maiuri et al. 1991, 1994). Although most non-persistent individuals show no staining for lactase in the jejunal biopsies of the small intestine (concordant with low lactase activity and transcriptional regulation of LCT), some individuals show patchy expression of the enzyme in the intestinal epithelia (Maiuri et al. 1991, 1994). This mosaic expression pattern might be attributable to somatic cell changes in methylation, or histone acetylation but curiously this is not attributable to an ‘inherited’ change in expression pattern from a single stem cell, since in that case ‘ribbons’ of positively stained cells would be expected.

Evolutionary considerations

The original observations in the 1970s and 1980s of a positive correlation between lactase persistence frequencies and milk drinking led to the widely held notion that lactase persistence has been subject to positive selection. In the intervening years molecular evidence has accumulated which would appear to corroborate this hypothesis. Our group first reported on the unusual pattern of lactase gene haplotype diversity across populations (Hollox et al. 2001). We found only four common 50 kb haplotypes outside Africa, with many more within Africa, and a very high frequency of the A haplotype in northern Europe, and suggested that the very different haplotype frequencies observed in N. Europeans as compared to other populations are most probably explained by a combination of genetic drift and strong positive selection for lactase persistence (Hollox et al. 2001).

More recently it has been shown that −13910*T occurs on an unusually extended haplotype background, which is present in the northern European population at very high frequency (Bersaglieri et al. 2004; Poulter et al. 2003). This is consistent with a model of recent positive selection, in which alleles surrounding the causal variant ‘hitch-hike’ rapidly to high frequency due to strong positive selection, and haplotype length is exaggerated, indicating a recent mutation event where recombination has not decayed the allelic associations in the region (reviewed in Sabeti et al. 2006). The −13910*T carrying chromosome is a real outlier in the context of molecular signatures of selection compared with the rest of the human genome (HapMap Consortium 2003). Decreased diversity of microsatellite polymorphisms (STRs) that occurs in the region of LCT and MCM6 was also found for the −13910*T carrying chromosomes, indicating that this allele has risen in frequency quickly and recently (Coelho et al. 2005; Mulcare 2006) (Fig. 5).
Fig. 5

Pie charts showing microsatellite LCT/MCM6 haplotypes on chromosomes of different SNP haplotype background: A haplotype carrying −13910*T, A haplotype carrying −13910*C and non-A haplotype chromosomes. 5579*C (rs2278544), SNP 10 in Fig. 3, used as a marker for A haplotype and 5579*T as a marker for non-A haplotype, and the A haplotype chromosomes are subdivided into those that do and do not carry −13910*T. The lactase persistence associated SNP, −22018*A (rs182549), originally described in Enattah et al. (2002) was tested on all samples and −22018*A correlated in all but one sample with −13910*T. Data taken from families and the haplotypes inferred from family structure. Data sets from: Irish n = 65 chromosomes, English n = 64, German, n = 60, French, n = 38, Ashkenazi Jews n = 96, Armenian, n = 88, Kuwaiti, n = 28, Algerian, n = 20, Ethiopian, Amharic n = 118; n values for main charts shown. The inset small charts show Ethiopian chromosomes only; n = 93 for non-A haplotype; n = 25 for A haplotype. It can be seen that both groups of A-haplotype chromosomes share the same modal haplotype as do both groups of non-A chromosomes. The microsatellites tested are located in intron 16 of MCM6, intron 1, 2 and 13 of LCT, respectively at positions 13840816, 136804355, 136798196, 136763409, from the Human Genome Browser ( July 2003 freeze (colour in online)

In our own study (Mulcare 2006) we used a marker for A haplotype chromosomes so that we could compare A haplotype chromosomes which carry the −13910*T with A haplotype chromosomes which do not, thus reducing the effect of pooling haplotypes of totally different lineages. Interestingly, we can see from this that the microsatellite haplotype that carries −13910*T is also the most frequent of the ancestral A haplotype chromosomes in Europeans, and also in non-Europeans. It can also be seen that within the non-A lineages there is a fairly frequent microsatellite haplotype which occurs in Europeans as well as non-Europeans (Fig. 5). It is associated with the B core haplotype in Europeans, and non-persistence. These observations suggest demographic factors additional to selection for one particular allele, as proposed previously (Hollox et al. 2001). Indeed, in the case of European lactase persistence, recent demic computer simulations indicate that the spread of farming from the near east during the Neolithic transition may have contributed to the high frequencies and genetic homogeneity of lactase persistence on the continent (Y. Itan, M. Thomas et al. manuscript in preparation).

Historical origins of lactase persistence; dating of the lactase persistence associated alleles

Each of the microsatellite diversity studies used the microsatellites to attempt to date the expansion of the −13910*T allele and the date ranges were 7,450–12,300 (Coelho et al. 2005), and 7,400–10,200 years ago (Mulcare 2006), and this agrees with date estimates obtained from extended haplotypes of 2,188–20,650 years ago (Bersaglieri et al. 2004). These dates are consistent with models of selection for lactase persistence along with the recent practise of dairying, approximately 9,000 years ago in Europe. Ancient DNA data obtained from human bones has shown that the −13910*T allele was either absent, or present at low frequencies, in early Neolithic Europeans. This is consistent with the -13910*T allele age estimates and supports a model whereby the cultural trait of dairying was adopted prior to lactase persistence becoming frequent (Burger et al. 2007).

The newly discovered −14010*C allele is also reported to occur as part of an unusually extended haplotype, suggesting that Africans too carry these signatures of recent positive selection for lactase persistence. In this case the allele is estimated to be between 1,200 and 23,200 years old (Tishkoff et al. 2007).

The identification of the newly associated alleles themselves suggests that lactase persistence has arisen and been selected for independently in several different human populations, thus the ability to digest milk has been extremely advantageous, at least for some, in the last few thousand years.

What were the evolutionary forces?

Because of the worldwide distribution of lactase persistence and the generally coinciding pattern of historically milk-drinking populations, Simoons and McCracken independently suggested, more than 30 years ago, that milk dependence created strong selection for lactase persistence (McCracken 1971; Simoons 1970). This has become known as the ‘culture historical hypothesis’, and suggests that the rise in lactase persistence co-evolved alongside the cultural adaptation of milk drinking, and its associated nutritional benefits. Nevertheless, the correlation is not absolute and there are exceptions in both directions. For example there are some ethnic groups who rely heavily on milk products and for whom cows or camels play a very important role in their lifestyle, but who have a low reported frequency of lactase persistence, for example, the Dinka and Nuer in Sudan (Bayoumi et al. 1982) and the Somali in Ethiopia (Ingram 2008). Statistical modelling shows that an incomplete correlation can be accommodated if some lactase persistent populations have recently stopped milking or conversely have only recently adopted the habit, therefore allowing insufficient time for lactase persistence to be driven to high frequency (Aoki 1986). Population migration may also have played an important role. In addition the cultural practise of milk fermentation (e.g. to yoghurt or cheese) reduces lactose content allowing non-persistent individuals to benefit from milk products.

Holden and Mace using regression analyses and correcting for relatedness of different populations claimed that lactose digestion capacity had most likely evolved as an adaptation to dairying, and concluded that high frequency lactose digestion capacity had never ‘evolved’ without the prior presence of milking (Holden and Mace 1997). Other evidence suggested to be in support of the culture-historical hypothesis has been provided by the observation that high-intra allelic diversity of cattle milk protein genes in Europe coincides with the geographic incidence of lactase persistence, which is consistent with large herd sizes kept for dairying and selection for high milk yields (Beja-Pereira et al. 2003).

However, it is noteworthy that at least in the Somali, one of us (CI) has obtained data to suggest that significant quantities of fresh milk are consumed by many who are lactase non-persistent (Ingram 2008) apparently without any adverse effects, and it seems likely that adaptation of the colonic bacterial flora allows digestion of lactose by these people. This means that under normal circumstances lactase persistence is unlikely to be under very strong selection in this population, and fits with the hypothesis that dairying and milk drinking can emerge before the genetic adaptation. It is likely that only at certain times and under more extreme circumstances, such as drought and famine, that the strong selective force operates. This is an extension of the arid climate hypothesis, first suggested by Cook and al-Torki (1975). These authors speculated that in desert climates (i.e. Middle and Near East) where water and food were scarce, nomadic groups could survive by utilizing milk as a food source, and in particular, as a source of clean, uncontaminated fluid (Cook and al-Torki 1975). This scenario is particularly pertinent to desert nomads whose major source of milk is obtained from camels, as these animals are able to survive up to 2 weeks without food and water by metabolising the fat contained in their humps. The benefits to persistent individuals may have become more pronounced during outbreaks of diarrhoeal disease, when non-persistent individuals would be unable to utilize milk as a water source without exacerbating their condition.

More recent research sought to address the question of why some populations and not others had adopted the cultural habit of milk drinking. The frequencies of lactose malabsorption were greater in populations where environmental conditions, such as extremes of climate or high incidence of endemic cattle disease, made it impossible to raise livestock (Bloom and Sherman 2005). The exceptions to the general distribution were a number of African groups with high lactase persistence frequency who managed to circumvent harsh environmental conditions by adopting a pastoralist way of life (Bloom and Sherman 2005).

Obviously, the benefits of milk drinking cannot be explained by the arid climate hypothesis in Northern Europe. Here, the advantage of improved calcium absorption has been suggested to explain the distribution of the trait (Flatz and Rotthauwe 1973). The low light levels experienced at high latitudes are associated with an increased risk of developing rickets and osteomalacia due to a lack of vitamin D production (which is synthesized by the skin in the presence of sunlight). Vitamin D is involved in the gut absorption of calcium, which is itself an essential mineral required for bone health. In addition, calcium may help to prevent rickets by impairing the breakdown of vitamin D in the liver (Thacher et al. 1999). Although lactase non-persistent individuals could obtain calcium from yoghurt or cheese, dairy foods that contain reduced lactose, milk proteins and lactose are believed to facilitate the absorption of calcium (for review see Gueguen and Pointillart 2000). Hence the ability to drink fresh milk which contains both calcium and components that stimulate its uptake (including small amounts of vitamin D) may have provided an advantage to persistent individuals.

Just one hypothesis has been put forward which suggests selection for lactase non-persistence. Since lactase non-persistence is the ancestral state, the need to invoke selection for non-persistence is counter-intuitive, but should not be ignored. In this proposal the selective agent is thought to be malaria (Anderson and Vullo 1994). This proposal came from the observations of high frequency of lactase non-persistence in regions where malaria is endemic, and that individuals with flavin deficiency are at a slightly reduced risk of infection by malaria. The consumption of milk, which is rich in riboflavins, was therefore proposed to be unfavourable since it would keep flavin levels in the bloodstream high. There is currently no support for this hypothesis (Meloni et al. 1998), and it seems unlikely to contribute to the current distribution of lactase persistence.

Present day health and medical considerations

Lactose malabsorption can readily be confused with milk protein allergy, which has quite different causes (reviewed in Crittenden and Bennett 2005), and in recent times lactose intolerance has been blamed for causing a variety of systemic conditions, often without clear evidence (Campbell and Matthews 2005; Matthews et al. 2005). Nonetheless it does appear that consumption of milk and milk products by those who cannot digest lactose is a relatively common cause of irritable bowel syndrome in Europe and the USA (Vesa et al. 2000). Many commercial dairy products and other foods (including yoghurts) contain high concentrations of lactose introduced in manufacturing, so that lactose is more widespread in the diet than it was for that same person’s ancestors. Lactose tolerance testing can be a useful way of detecting lactose malabsorption and enabling avoidance of the cause, but DNA testing is not yet useful, particularly for non-Europeans (Swallow 2006; Tag et al. 2008; Weiskirchen et al. 2007). In countries such as Finland, where there is a high frequency of lactase non-persistence in comparison with the rest of northern Europe, commercial low lactose products are readily available (Harju 2003).

Many association studies have attempted to demonstrate the health benefits of milk consumption in lactase persistent people, e.g. by providing protection against osteoporosis (Enattah et al. 2005a, b; Meloni et al. 2001; Obermayer-Pietsch et al. 2004), and others have claimed adverse effects of lactase persistence and associated high milk consumption (e.g. cataracts, ovarian cancer and diabetes) (Enattah et al. 2004; Larsson et al. 2006; Meloni et al. 2001; Meloni et al. 1999; Villako and Maaroos 1994). The often-contradictory findings are difficult to evaluate because of the high risk of confounding effects such as mixed ancestry, dietary intake and variation in gut flora.


Lactase persistence has been one of the leading examples of natural selection in humans, and also one of the first clear examples of polymorphism of a regulatory element. Further investigation of the molecular mechanisms as well as the evolutionary forces is however needed to fully understand this normal variation, which is providing an important model for understanding gene/culture co-evolution and disease susceptibility. The information accrued so far already illustrates the limitations of disease association studies and SNP tagging to find functional genetic variation attributable to multiple mutations, even if they are located in a single gene, and highlights the potential importance of distant regulatory elements.


CJEI and CAM were funded by BBSRC CASE studentships and YI was funded by UCL Graduate school, UCL ORS and B’nai B’rith/Leo Baeck London Lodge scholarships. We thank Neil Bradman, The Centre for Genetic Anthropology, UCL, for access to samples and Melford Charitable Trust for funding.

Supplementary material

439_2008_593_MOESM1_ESM.pdf (809 kb)
Supplementary figure 1a Interpolated maps of the ‘old world’ showing the distribution of (a) lactase persistence data taken from the literature (Supplementary data Table 1), (b) -13910*T distribution (c) lactase persistence frequency predicted from -13910*T distribution, using the data collection to be found in Supplementary data Table 3. Maps were generated using PYNGL ( Only includes individuals over 12 years of age, who are unrelated, and literature for which the original publications have been located and checked. Articles in which there was clear selection bias, and recent immigrant populations are excluded, but the data can be found in Supplementary data Table 1. The Americas are excluded from all maps because of the paucity of data. Most data are obtained from lactose tolerance tests using either breath hydrogen or blood glucose, though in some cases enzyme assay data were available. Locations were either as described precisely in the publication, or taken from capital cities or central points of a country or region where precise location is not mentioned. Where more than one data set was available weighted averages of the data were taken. Predicted frequency taken to be p2+ 2pq, where p is the frequency of -13910*T. Data points are shown as dots. It should be noted that the interpolation is inaccurate where there are few data points (PDF 809 kb)
439_2008_593_MOESM2_ESM.pdf (752 kb)
Supplementary figure 1b See supplementary figure 1a for legend (PDF 752 kb)
439_2008_593_MOESM3_ESM.pdf (766 kb)
Supplementary figure 1c See supplementary figure 1a for legend (PDF 766 kb)
439_2008_593_MOESM4_ESM.pdf (40 kb)
Primary source literature references of lactase persistence and lactose tolerance data used for maps depicting geographic distribution. Columns show numbers of people tested, country of origin, ethnic group, test method, and whether or not the data fulfilled all the criteria for inclusion (original reference found and checked; unrelated individuals; age 12 or more; unbiased selection criteria - e.g. not selected from patients with diarrhoea). Reasons for non-inclusion are shown in the notes. In those cases where children or family members were individually identifiable, they were excluded from the data sets and this is reflected in the numbers given. Recent immigrant populations are excluded from the maps shown in the review article. Locations (longitude and latitude) were either as described precisely in the original publication, or taken from capital cities or central points of a country in those cases that the precise location is not mentioned. Data included only in review articles were not used. Reviews searched for source references include: Flatz, (1987); Scrimshaw and Murray, (1988); Swallow and Hollox, (1999) (PDF 39.5 kb)
439_2008_593_MOESM5_ESM.pdf (29 kb)
Estimates of lactase persistence frequency in different countries obtained by adjusting for population census size (taken from CIA data; World Fact Book (PDF 29.0 kb)
439_2008_593_MOESM6_ESM.pdf (22 kb)
Literature and own frequency data for -13910*T. Data taken from SNP typing tests as well as from resequencing. Predicted lactase persistence frequency attributable to this allele taken to be p2 + 2pq (PDF 21.7 kb)

Copyright information

© Springer-Verlag 2008