SNP haplotypes and allele frequencies show evidence for disruptive and balancing selection in the human leukocyte receptor complex
- First Online:
- Cite this article as:
- Norman, P.J., Cook, M.A., Carey, B.S. et al. Immunogenetics (2004) 56: 225. doi:10.1007/s00251-004-0674-1
- 163 Views
The human leukocyte receptor complex (LRC) of Chromosome 19q13.4 encodes polymorphic and highly homologous genes that are expressed by cells of the immune system and regulate their function. There is an enormous diversity at the LRC, most particularly the variable number of killer cell immunoglobulin-like receptor (KIR) genes. KIR have been associated with several disease processes due to their interaction with polymorphic human leukocyte antigen class I molecules. We have assessed haplotype compositions, linkage disequilibrium patterns and allele frequencies in two Caucasoid population samples (n=54, n=100), using a composite of single-nucleotide polymorphism (SNP) markers and high-resolution, allele-specific molecular genotyping. Particular KIR loci segregated with SNP and other markers, forming two blocks that were separated by a region with a greater history of recombination. The KIR haplotype composition and allele frequency distributions were consistent with KIR having been subject to balancing selection (Watterson’s F: P=0.001). In contrast, there was a high inter-population heterogeneity measure for the LRC-encoded leukocyte immunoglobulin-like receptor A3 (LILRA3), indicating pathogen-driven disruptive selection (Wright’s FST=0.32). An assessment of seven populations representative of African, Asian and Caucasoid ethnic groups (total n=593) provided little evidence for long-range LRC haplotypes. The different natural selection pressures acting on each locus may have contributed to a lack of linkage disequilibrium between them.
KeywordsKIRLeukocyte receptor complexNatural killer cellSNPHaplotypeNatural selection
Leukocyte receptor complex (LRC) molecules are expressed by cells of the immune system and control their potentially self-destructive activities (Carrington and Norman 2003). In humans, many of the LRC interact with human leukocyte antigen (HLA) molecules on target cells. Because HLA and LRC genes segregate independently, both are highly polymorphic and can be expressed in the absence of their respective ligand, incompatible combinations and inappropriate immunological activity may arise (Vilches and Parham 2002). Killer cell immunoglobulin-like receptors (KIR), which are expressed by natural killer (NK) and some T cells, have been the most extensively characterised of the LRC. Inherited or imposed combinations of HLA and KIR molecules can determine the balance between disease control (Martin et al. 2002a) and autoimmune tissue damage (Martin et al. 2002b) or allogeneic transplant rejection (Chang et al. 2002; Parham and McQueen 2003). An awareness of the scale and functional implications of genetic variation for all the LRC molecules will be essential in determining their evolutional history and contribution to the pathology of infectious, autoimmune and malignant diseases.
Rapid evolution of the KIR loci is evident from comparison with primates (Guethlein et al. 2002; Khakoo et al. 2000; Rajalingam et al. 2001b), suggesting that—like their HLA counterparts—KIR molecules are subject to significant natural selection pressures. The major histocompatibility complex (MHC), which encodes HLA and many other molecules, is characterised by high levels of variation and long range, but irregular, LD. Strong LD can be observed for distances >1 Mb in the MHC, possibly due to selection for distinct haplotypes (e.g. Ajioka et al. 1997; Trachtenberg et al. 1995). Conversely, some closely linked loci have little allelic association due to areas with increased levels of recombination, otherwise known as hot spots (Jeffreys and Neumann 2002; Walsh et al. 2003). We have analysed KIR haplotype variation and the extent of LD between the KIRs and LILRA3 (Fig. 1) to assess the impact of natural selection on the LRC region. Due to the inherent difficulty in obtaining a true genotype for variable-content regions, we have utilised single-nucleotide polymorphic (SNP) markers in the flanking framework loci (Fig. 1) and other LRC markers. Several SNP substitutions in LRC molecules are already known to be important moderators of function, as they can dictate expression, activity or binding specificity, prevent transcription or drastically alter the structure of the final molecule (Boyington et al. 2001; Pando et al. 2003).
Materials and methods
One hundred healthy, unrelated United Kingdom (UK) Caucasoid individuals from southeast England (Norman et al. 2001) and 27 Utah families recruited through the Utah Genomic Reference Project (UGRP) were studied (see Hall et al. 2002). Three generations were available from most of the Utah families, with an average of seven individuals in the third generation. All UGRP subjects gave informed consent under University of Utah IRB approved protocol 6090-96.
Markers and genotyping
Killer cell immunoglobulin-like receptor (KIR) framework locus single-nucleotide polymorphism (SNP) genotyping primers and allele frequencies. Nucleotide positions are from ATG start codon of KIR2DL4, and exons are numbered according to alignment with KIR3D (2DL4 starts two codons before the other KIR and has no exon 4). Common allele (in Caucasoid) shown first. c.a.f. Common allele frequency [top number United Kingdom (UK), bottom number Utah]. Small letters indicate sequence in intron. The four SNPs in 3DL3 distinguished the five known alleles of this locus
1. 3DL3 Exon 3
2. 3DL3 Exon 3
3. 3DL3 Exon 5
4. 3DL3 Exon 5
5. 2DL4 Exon 3
6. 2DL4 Exon 5
7. 2DL4 Exon 7
8. 3DL2 Exon 1
9. 3DL2 Exon 3
10. 3DL2 Exon 9
KIR markers, population frequencies and primers used for their assay
Allele-specific PCR reactions were designed to detect all known alleles of 3DL1 and 3DS1 in separate reactions (Tables 1, 2). Three alleles of 3DS1, *001–*003, were not detected in initial experiments (not shown). All alleles of 3DL1—apart from *006—were detected here, and HWE was observed in both Caucasoid populations (Tables 1, 2). Genotype also correlated with flow-cytometric phenotype for all individuals (not shown). For example, 3DL1 phenotyping demonstrated that none of the 19 independently segregating *004 alleles exhibited DX9 (3DL1-specific) monoclonal antibody staining, consistent with the lack of cell-surface expression of 3DL1*004 (Pando et al. 2003).
LILRA3 polymorphism/LRC haplotypes
Genotyping for seven variants of LILRA3 by allele-specific PCR was performed exactly as described (Norman et al. 2003). Two African (n=60, n=50), two South Asian (n=92, n=103), Palestinian (n=100), Thai (n=119) and UK Caucasoid (n=172) population samples were studied. All sample collections and KIR genotype profiles were as described previously (Carrington et al. 2002; Cook et al. 2003; Norman et al. 2001, 2002). All populations consisted of healthy, unrelated individuals, and institutional ethical approval was obtained. LD between KIR and LILRA3 was analysed using SNP genotypes for UK and Utah populations and the KIR2DL2/KIR2DL3 genotypes for all populations.
The LD parameters Δ and D′ were calculated for SNP pairs using the expectation-maximisation (EM) algorithm of EHPLUS (Zhao et al. 2000). D′ was used in addition to Δ, as it is less dependent on the rare allele frequency. A χ2 test was also performed to test for significant departure from the null hypothesis that pairwise LD was due to random association. Two-locus haplotype frequencies for each LILRA3–KIR combination were also estimated and compared to those expected under equilibrium, using EHPLUS.
Ten-locus SNP haplotype frequencies were estimated from the UK population and Utah parents using EM, giving 80% correlation with those deduced by segregation as predicted by simulation (Fallin and Schork 2000). When each individual framework-locus three- or four-SNP haplotype frequency was estimated by compacting the ten-locus haplotypes (Single et al. 2002), there was 98–99% correlation. Haplotype frequencies for each of the individual framework loci that were estimated in this manner were identical for the UK and Utah populations.
As the Utah haplotypes were determined by segregation analysis, it was possible to regard individual variable content loci as bi-allelic during subsequent analysis, where allele 1 = locus present and allele 2 = locus absent; the common allele for each locus was determined empirically, but can also be obtained from the respective gene frequency in the UK Caucasoid group in each case (Norman et al. 2001).
Wright’s FST provides an indication of the variation in allele frequency relative to other alleles or loci amongst a group of populations (Wright 1951). Genetic drift is the random change in allele frequency over time that has not been caused by selective pressure. There is an expected distribution of allele frequencies amongst a group of populations, but some do not follow the expected distribution, and this indicates that natural selection has affected the locus (Bowcock et al. 1991; Cavalli-Sforza et al. 1994). Wright’s FST was calculated by FST=Vp/p(1−āp), where Vp = variance of allele frequency (p) across the populations, āp = average allele frequency (p) and n = number of populations. Vp=Σ(p−āp)/(n−1).
Allele frequency distributions within populations can be summarised by the sum of the squares of the frequencies (Watterson’s F-statistic, Watterson 1975). Under neutral evolution, F will fall into an expected range, dependant on the number of alleles and individuals in the population sample. Balanced selection results in an even allele frequency distribution and an observed F that is significantly lower than that expected (Watterson 1975). Observed F was compared to the expected range available from (http://allele5.biol.berkeley.edu/homozygosity/homozygosity.html).
There were 22 distinct KIR gene-content haplotypes from the 108 determined. At the highest resolution analysed, there were 75 distinct KIR haplotypes from the 108 total. These high-resolution haplotypes fell into seven distinct clades (I–VII, Fig. 2). Each haplotype clade had a distinct pattern of markers, and many of the markers appeared only in subsets of the haplotype groups. Three of the haplotype clades (V–VII) were comprised of the A gene-content haplotypes, further demonstrating that gene content is just one component of KIR diversity.
Several markers were chosen based on assumptions of their functional importance, so that they were more likely to describe useful associations than those randomly selected (Collins et al. 1997). The particular markers were: (1) 2DL4 SNP t1061, which has been shown to indicate the presence of truncated 2DL4 in another Caucasoid population as it is less than 50 bp upstream from and in complete LD with this single base insertion/deletion site (Witt et al. 2002). (2) 3DL1 allele *004 is not expressed at the cell surface, *005, *006 and *007 have low expression, and *001, *1502 and *008 have higher expression (Pando et al. 2003; Vilches and Parham 2002). Also, 3DS1 is an allele of 3DL1 (Trowsdale et al. 2001), but with opposite function. Here, KIRs 3DL1 and 3DS1 did appear to be alleles (Σallele frequency ~1, Table 2), although two haplotypes contained both (h18, h27; Fig. 2), and one was found with neither (h01, Fig. 2). One of the haplotypes with both 3DS1 and 3DL1 (h18) is apparently identical to a haplotype that has been described recently in greater detail and also contains two copies of 2DL4 (Martin et al. 2003; Williams et al. 2003). Other haplotypes presented here may also contain more than one copy of any KIR in addition to those shown. (3) 2DL5, which has been duplicated and one form, 2DL5B*002 (2DL5.2), is not expressed due to promoter region polymorphism (Vilches and Parham 2002). KIR haplotypes contained none, one or both form(s) of 2DL5 (Fig. 2), as previously observed (Gomez-Lozano et al. 2002). (4) The 22-bp deletion in 2DS4, which results in a truncated molecule with only one Ig domain (Hsu et al. 2002b; Maxwell et al. 2002).
Linkage disequilibrium values of phase-known 2DL4 and 3DL2 SNP haplotypes for each allele of 3DL/S1. SNP common allele is shown first
KIR SNP genotypes
There was a high heterozygosity of KIR SNP genotypes (Fig. 2). The mean SNP heterozygosity (0.36 Utah, 0.38 UK) was higher than that observed for the peptide-fragment contacting codons of HLA (0.3, Hedrick et al. 1991). All subjects from both populations were heterozygous for at least one KIR region marker and only two unrelated (UK) individuals had identical genotypes (not shown). When all full-resolution KIR haplotypes (see Fig. 2) were pooled and the population re-sampled under the assumption of random mating, only 1% of pairs matched. Thus, only one UK individual (≈randomly selected Utah haplotype pair) was homozygous for all KIR markers.
KIR SNP haplotypes
KIR3DL1 allele frequencies
The allele frequencies observed in both the Utah and UK Caucasoid populations are presented in Table 2. Using this PCR-SSP genotyping scheme, seven alleles were detected in each of the two groups. The allele frequencies were evenly distributed between the two groups, and the common alleles (*001, *002, *003, *004, *005 and 3DS1) were evenly dispersed within each group. To test whether the within-population frequency distribution was likely to have arisen under neutral evolution, we applied Watterson’s F-statistic. Using the 3DL1 allele frequencies that are shown in Table 2, F (Fobs) for the UK population was 0.176, which was significantly lower than the expected (Fexp=0.421, P=0.001). Fobsfor the Utah population was 0.156, which was also significantly lower than the Fexp of 0.38 (P = 0.0001). Thus, the 3DL1 alleles were more evenly distributed in these two populations than would be expected under neutral evolution (Watterson 1975).
LILRA3 allele frequencies and LRC haplotypes
Leukocyte immunoglobulin-like receptor A3 (LILRA3) allele frequencies for seven populations representative of Caucasoid, African and Asian ethnicities. Allele frequencies from genotype (2n)
Number of individuals
African 1 (Trinidad)
African 2 (UK)
South Asian 1 (Trinidad)
South Asian 2 (Pakistan)
We have described 108 KIR/LILR haplotypes that were deduced by segregation in families sampled from a Caucasoid population, using a composite of high-resolution, allele-specific and SNP markers. The statistical associations amongst the markers analysed—and thus the underlying haplotypes—were very similar in a separate Caucasoid sample that was analysed without knowledge of linkage phase. Haplotypes can be informative genetic indicators of disease associations or diagnosis and coding-region SNPs in particular will provide useful and biologically relevant markers (Collins et al. 1997). Despite the complex nature of the KIR cluster, framework SNPs revealed sufficient structure for subsequent detection of associations with disease or phenotype. Approximately 50% of the KIR SNP pairs had |D′ | above 0.3 (Figs. 3, 4), which is likely to be the practical threshold for association studies (Kruglyak 1999). The strategy of choosing three SNPs per locus meant that most were in LD with at least one other marker, and that the different SNP haplotypes were characterised by distinct patterns of markers associated with them. Examples of strong LD were observed within and between the framework loci (Fig. 3b). Most markers would thus have useful LD with several others, implying that further variation will be detected by virtue of association. Moreover, various characteristics such as the lack of 2DL4, 2DS4 and 3DL1 expression appear to be linked as they often co-segregate, so that the major KIR region haplotype blocks may be viewed as the unit of functional variability.
Some human KIR, such as 2DL4, 2DS4 and 3DL1, have alleles that are severely mutated or not expressed. Similar 2DL4 mutations are found in several other species and both known orang-utan alleles are inactivated or truncated (Guethlein et al. 2002). Normal and truncated 2DL4-encoding haplotypes were observed here in equal proportions (Fig. 2). One human haplotype lacked this locus (h01, Fig. 2), and individuals can lack 2DL4 altogether (Norman et al. 2002). There was also a high prevalence of inactivated or absent 2DS4 and normally expressed alleles of 3DL1 had a combined allele frequency of only 42% (Table 2). Many of the most frequent A haplotypes encode a truncated 2DS4 variant (Hsu et al. 2002a; Maxwell et al. 2002); this analysis has shown that these same haplotypes and thus ~10% apparently healthy Caucasoids concurrently have reduced expression of 3DL1 and a truncated 2DL4 (Table 3).
KIR interact with HLA to control immunologically active cells, but when absent their functions are compensated by other molecules (Vilches and Parham 2002). Despite this redundancy, observations presented here and elsewhere suggest that KIR have been subject to strong natural selection forces during their evolution. KIR evolution may be driven by the need to recognise HLA (Vilches and Parham 2002); alternatively some KIR may recognise markers of cellular invasion, thus driving their polymorphism at a rate equivalent to that of HLA (Martin et al. 2002b). Many of the HLA loci have been under positive balancing selection, which enhances the number of non-synonymous substitutions that reach polymorphic frequency and maintains heterozygosity. Other characteristics of balanced selection include locus inactivation, strong LD, even distribution of alleles and prevalence of coding-change substitutions, indicating that certain alleles have responded to specific selecting events (Bowcock et al. 1991; Fay et al. 2001; Hedrick et al. 1991; Hughes and Yeager 1998; Slatkin 2000). HLA heterozygosity enables a wide range of peptides to be presented and a more efficient T-cell repertoire to develop within the individual (Hughes and Yeager 1998; Messaoudi et al. 2002). Accordingly, there is a high level of KIR polymorphism (Vilches and Parham 2002), strong LD amongst the loci (e.g. see Shilling et al. 2002b; Figs. 2, 3b) and a high ratio of coding-change SNPs in the HLA-contacting Ig domains (Yawata et al. 2002a). In our analysis both Watterson’s F and Wright’s FST—which are independent statistical tests for distribution within and between populations, respectively—indicated that the KIR frequencies we observe today are not likely to have arisen during neutral evolution. The various alleles of 3DL1 were distributed in the two Caucasoid populations groups such that maximum heterozygosity was obtained in both. Watterson’s F-test, which may not always yield a statistically significant result—even for classical HLA loci (Bugawan et al. 2000)—produced a significant result for each of two independently sampled Caucasoid populations. By contrast, in these same populations, the LILRA3 frequencies were unevenly distributed. Wright’s FST compared the KIR locus frequencies between populations; when there are many alleles at similar frequencies, the probability of finding a homozygous genotype in a population is similar to that of obtaining a homozygote by random pairing between populations and the FST is low. Classical HLA have many more alleles than the number expected under neutral evolution and high heterozygosity, thus low FST (Bowcock et al. 1991; Fay et al. 2001; Hedrick et al. 1991). The FST values for KIR were similar or lower than HLA from a similar selection of population groups (Bowcock et al. 1991). FST values were also calculated using the HLA genotypes of our population samples (data not shown), with very similar results to those previously detailed (Bowcock et al. 1991; Cavalli-Sforza et al. 1994), thus supporting our findings. Although this part of our analysis was confined to the KIR locus frequencies in the different ethnic populations, and more information will be gleaned when it is possible to obtain allele frequencies from these KIR in the population samples, low FST values provide further evidence for balanced selection acting on KIR.
Strong LD and common haplotypes that are distinguishable from the highly heterogeneous background are hallmarks of positive balanced selection (Sabeti et al. 2002; Slatkin 2000). The complexity of KIR locus SNP arrays, coupled with the presence of distinct haplotypes (Table 3; Fig. 2) resembled that observed for MHC (e.g. Ajioka et al. 1997; Graham et al. 2002), which has a turnover of selectively advantageous alleles and haplotypes in response to varying and rapidly evolving pathogen challenge (Nei et al. 1997). If an allele has a strong selective advantage, then it will increase in frequency but have low haplotype diversity (Fay et al. 2001). The haplotypes then disperse by recombination and mutation, contributing to the background diversity we observe. For example, 3DS1 occurred predominantly with a single 2DL4/3DL2 SNP array (Table 3). There was less variation of SNP markers on full-length 3DS1 haplotypes than on the most frequent A haplotypes (Fig. 2), implying 3DS1 or another locus on this very distinct haplotype has had a selection advantage (see Hamblin et al. 2002). KIR3DL1*004, which is a non-expressed allele (Pando et al. 2003), was also associated with a distinct SNP haplotype (Table 3) and may also have conferred a natural advantage.
Wright’s FST calculations indicated a marked inter-population variation for all LILRA3 alleles (Table 4). FST distributions are not necessarily comparable for different groups of populations, but analysis of a similar group of populations (two African, two Asian and two European Caucasoid) for 100 polymorphic variations (Bowcock et al. 1991) showed very few genes with inter-population diversity of the magnitude observed here. For example, the value of 0.32 for LILRA3*006 placed this in the 95th percentile of observed FST values, showing an uneven allele frequency distribution that is characteristic of disruptive selection (Bowcock et al. 1991). The only marker with consistently higher FST values than LILRA3 was Duffy antigen, which can confer resistance to Plasmodium vivax infection (Cavalli-Sforza et al. 1994). Disruptive selection implies that a particular allele has an advantage or a disadvantage in one or more population group because that population is subjected to an environmental circumstance that is not shared by all the others. Although one must never rule out genetic drift as the cause of allele frequency variation between populations (Cavalli-Sforza et al. 1994), and we possess only one line of evidence for this locus, our results suggest that mutations at LILRA3 or a nearby locus confer an advantage in some populations, possibly in response to a single pathogen.
Despite the recombination hot spot, there were several haplotypes that remained identical throughout the KIR cluster, but there was a decline of LD over the 450 kb between KIR and LILRA3. This contrasts with the MHC region, which characteristically displays strong LD over longer distances. For the average chromosomal region LD declines, albeit irregularly, with distance until ~500 kb, where there is no difference from unlinked loci (Abecasis et al. 2001; Dunning et al. 2000). Thus, it appears that the interval between LILRA3 and KIR displays normal LD decay, and there is no statistically significant long-range association (Fig. 4).
The evolutionary path of KIR has been diverging from that of LILR (Trowsdale et al. 2001; Vilches and Parham 2002), and the analysis presented here has shown in modern human populations that this may have been due to different selective pressures acting on each. LILRA3 has apparently been under disruptive selective pressure, whilst KIR has evolved to maximise heterozygosity, contributing to a lack of LD between closely linked loci and few common haplotypes that extend throughout the LRC. The LD between loci under different selective pressures is expected to be zero (Nei 1987). Gene duplication events in the MHC can result in the loss of polymorphism for the duplicated and adjacent loci (Parham 1994). The lack of LD between LILR and KIR may indicate evolution of a system generating diversity in gene number for a set of loci, while simultaneously maintaining the degree of allelic diversity for nearby genes.
Many thanks to Sheila Fisher, Jerry Lanchbury and to the members of the Parham Laboratory for advice and control DNA samples. This investigation was supported by a Public Health Services research grant no. MO1-RR00064 from the National Center for Research Resources to the Hunstman General Clinical Research Center at the University of Utah. It was also supported by generous gifts from the W.M. Keck Foundation and the George S. Delores Doré Eccles Foundation (Utah) and a grant from the Howard Ostins trust fund (UK). We would like to extend our sincere thanks to all participants and personnel. The work described was performed in accordance with all appropriate regulations.