Immunogenetics

, Volume 56, Issue 4, pp 225–237

SNP haplotypes and allele frequencies show evidence for disruptive and balancing selection in the human leukocyte receptor complex

Authors

    • Clinical Transplantation LaboratoryGuy’s Hospital
  • Mark A. Cook
    • Cancer Research UK Institute for Cancer StudiesUniversity of Birmingham
    • Histocompatibility and ImmunogeneticsNational Blood Service
  • B. Sean Carey
    • Clinical Transplantation LaboratoryGuy’s Hospital
    • Institute of Urology and NephrologyUniversity College
  • Christine V. F. Carrington
    • Department of Pre-Clinical Sciences, Faculty of Medical SciencesUniversity of the West Indies
  • David H. Verity
    • Department of OpthalmologySt Thomas’ Hospital-King’s College
  • Kamran Hameed
    • Department of MedicineAga Khan University Hospital
  • D. Dan Ramdath
    • Department of Pre-Clinical Sciences, Faculty of Medical SciencesUniversity of the West Indies
  • Dasnayanee Chandanayingyong
    • Department of Transfusion MedicineSiriraj Hospital
  • Mark Leppert
    • Eccles Centre for Human GeneticsUniversity of Utah
  • Henry A. F. Stephens
    • Clinical Transplantation LaboratoryGuy’s Hospital
    • Institute of Urology and NephrologyUniversity College
    • Clinical Transplantation LaboratoryGuy’s Hospital
Original Paper

DOI: 10.1007/s00251-004-0674-1

Cite this article as:
Norman, P.J., Cook, M.A., Carey, B.S. et al. Immunogenetics (2004) 56: 225. doi:10.1007/s00251-004-0674-1

Abstract

The human leukocyte receptor complex (LRC) of Chromosome 19q13.4 encodes polymorphic and highly homologous genes that are expressed by cells of the immune system and regulate their function. There is an enormous diversity at the LRC, most particularly the variable number of killer cell immunoglobulin-like receptor (KIR) genes. KIR have been associated with several disease processes due to their interaction with polymorphic human leukocyte antigen class I molecules. We have assessed haplotype compositions, linkage disequilibrium patterns and allele frequencies in two Caucasoid population samples (n=54, n=100), using a composite of single-nucleotide polymorphism (SNP) markers and high-resolution, allele-specific molecular genotyping. Particular KIR loci segregated with SNP and other markers, forming two blocks that were separated by a region with a greater history of recombination. The KIR haplotype composition and allele frequency distributions were consistent with KIR having been subject to balancing selection (Watterson’s F: P=0.001). In contrast, there was a high inter-population heterogeneity measure for the LRC-encoded leukocyte immunoglobulin-like receptor A3 (LILRA3), indicating pathogen-driven disruptive selection (Wright’s FST=0.32). An assessment of seven populations representative of African, Asian and Caucasoid ethnic groups (total n=593) provided little evidence for long-range LRC haplotypes. The different natural selection pressures acting on each locus may have contributed to a lack of linkage disequilibrium between them.

Keywords

KIRLeukocyte receptor complexNatural killer cellSNPHaplotypeNatural selection

Introduction

Leukocyte receptor complex (LRC) molecules are expressed by cells of the immune system and control their potentially self-destructive activities (Carrington and Norman 2003). In humans, many of the LRC interact with human leukocyte antigen (HLA) molecules on target cells. Because HLA and LRC genes segregate independently, both are highly polymorphic and can be expressed in the absence of their respective ligand, incompatible combinations and inappropriate immunological activity may arise (Vilches and Parham 2002). Killer cell immunoglobulin-like receptors (KIR), which are expressed by natural killer (NK) and some T cells, have been the most extensively characterised of the LRC. Inherited or imposed combinations of HLA and KIR molecules can determine the balance between disease control (Martin et al. 2002a) and autoimmune tissue damage (Martin et al. 2002b) or allogeneic transplant rejection (Chang et al. 2002; Parham and McQueen 2003). An awareness of the scale and functional implications of genetic variation for all the LRC molecules will be essential in determining their evolutional history and contribution to the pathology of infectious, autoimmune and malignant diseases.

Linkage disequilibrium (LD), which can indicate co-segregation of syntenic alleles, is crucial to the assessment of genes predisposing to disease and its severity. Analyses of LD patterns have provided valuable information towards describing genetic organisation, for disease associations and investigating the evolution and functions of relevant molecules (Bamshad and Wooding 2003; Gabriel et al. 2002; Sabeti et al. 2002). Characterisation of LRC haplotypes is confounded by the variable number of KIR loci (Uhrberg et al. 1997; Wende et al. 2000; Wilson et al. 2000), with an enormous diversity of gene content occurring both within and between populations (Hsu et al. 2002a; Yawata et al. 2002a). During gametogenesis, genetic information may be exchanged by recombination between homologous chromosomes. Multi-locus KIR haplotypes have been created by unequal recombination that has occurred between or among loci, occasionally resulting in duplication (Martin et al. 2003; Vilches and Parham 2002; Williams et al. 2003). In contrast, other LRC families such as the leukocyte immunoglobulin-like receptors (LILR) have more conserved gene number (Fig. 1). Superimposed on KIR gene content is allelic diversity at nearly all of the LRC loci (Carrington and Norman 2003; Shilling et al. 2002a; Young et al. 2001), with many phenotypic differences between allotypes being directly attributable to these genetic differences (Shilling et al. 2002b; Vilches and Parham 2002). Population and segregation data have shown that certain groups of KIR loci are consistently in LD, the most common haplotype so far encountered having seven genes (haplotype A, Hsu et al. 2002a; Yawata et al. 2002a), whilst there are other loci unlikely to be detected together. Despite our expanding knowledge of KIR loci variability, little is known of their allelic diversity, and only two full-length LRC haplotypes have been fully sequenced (Wende et al. 2000; Wilson et al. 2000).
Fig. 1

Gene map of the human leukocyte receptor complex (LRC) on Chromosome 19q13.4. The loci studied here are shown in black (smaller blocks correspond to variable-content regions, 2–5 loci each). Three killer cell immunoglobulin-like receptor (KIR) genes, 3DL3, 2DL4 and 3DL2, and one pseudogene, 3DP1, which have been found on almost every haplotype are termed framework loci; between them lie the regions of variable content. Ten single-nucleotide polymorphisms (SNPs) were genotyped (1–4 in 3DL3, 5–7 in 2DL4 and 8–10 in 3DL2); for clarity only three are shown. Leukocyte immunoglobulin-like receptor A3 (LILRA3) gene is partially deleted in some individuals (Torkar et al. 2000). The various isotypes of each LRC family are defined by the number of Ig subunits and their ability to inhibit or stimulate cytolytic activity (for complete descriptions see Carrington and Norman 2003; Vilches and Parham 2002)

Rapid evolution of the KIR loci is evident from comparison with primates (Guethlein et al. 2002; Khakoo et al. 2000; Rajalingam et al. 2001b), suggesting that—like their HLA counterparts—KIR molecules are subject to significant natural selection pressures. The major histocompatibility complex (MHC), which encodes HLA and many other molecules, is characterised by high levels of variation and long range, but irregular, LD. Strong LD can be observed for distances >1 Mb in the MHC, possibly due to selection for distinct haplotypes (e.g. Ajioka et al. 1997; Trachtenberg et al. 1995). Conversely, some closely linked loci have little allelic association due to areas with increased levels of recombination, otherwise known as hot spots (Jeffreys and Neumann 2002; Walsh et al. 2003). We have analysed KIR haplotype variation and the extent of LD between the KIRs and LILRA3 (Fig. 1) to assess the impact of natural selection on the LRC region. Due to the inherent difficulty in obtaining a true genotype for variable-content regions, we have utilised single-nucleotide polymorphic (SNP) markers in the flanking framework loci (Fig. 1) and other LRC markers. Several SNP substitutions in LRC molecules are already known to be important moderators of function, as they can dictate expression, activity or binding specificity, prevent transcription or drastically alter the structure of the final molecule (Boyington et al. 2001; Pando et al. 2003).

Materials and methods

KIR haplotypes

One hundred healthy, unrelated United Kingdom (UK) Caucasoid individuals from southeast England (Norman et al. 2001) and 27 Utah families recruited through the Utah Genomic Reference Project (UGRP) were studied (see Hall et al. 2002). Three generations were available from most of the Utah families, with an average of seven individuals in the third generation. All UGRP subjects gave informed consent under University of Utah IRB approved protocol 6090-96.

Markers and genotyping

Each individual was subject to locus-specific analysis for all known KIR (i.e. presence or absence of 2DL1, 2DL1v (2DL1*004), 2DL2, 2DL3, 2DL4, 2DL5A, 2DL5B, 3DL1, 3DL2, 3DL3, 2DS1, 2DS2, 2DS3, 2DS4, 2DS5 and 3DS1). The KIR locus molecular-genotyping methods were described previously (Norman et al. 2002); all other markers and the primers used to detect them are detailed in Tables 1 and 2. KIR locus typing provides a profile of the KIR repertoire but little indication of copy number for the variable content genes. Therefore, ten SNPs in three of the ubiquitous (framework) KIR were analysed to determine their degree of co-segregation with the irregular genes. All three of these framework loci were detected in all ~600 Caucasoid individuals examined so far (Hsu et al. 2002b; Yawata et al. 2002a), and so it was reasonable to assume that they were present on every Caucasoid haplotype studied here. SNPs were chosen to span each KIR framework locus and with rare allele frequency >0.1 where possible (Table 1). In addition to the framework SNPs, the 3DP1 pseudogene was genotyped for two alleles that are distinguished by a 1.5-kb deletion; 2DS4 was genotyped for two alleles that are distinguished by a 22 bp deletion; 2DL2 was genotyped for a single SNP; and 3DL1 was genotyped at high resolution. DNA from previously sequenced individuals (Gardiner et al. 2001; Rajalingam et al. 2001a) was used to confirm genotyping specificity. Each of the SNP/bi-allelic reactions included one locus and one allele-specific primer. All other reactions used two allele-specific primers. Consistency of genotyping was confirmed by segregation and observation of sibling-identical genotypes in the Utah group and by compliance with Hardy–Weinberg equilibrium (HWE, determined by χ2 test) in both populations. The PCR conditions used for all reactions were exactly as described previously (Norman et al. 2002).
Table 1

Killer cell immunoglobulin-like receptor (KIR) framework locus single-nucleotide polymorphism (SNP) genotyping primers and allele frequencies. Nucleotide positions are from ATG start codon of KIR2DL4, and exons are numbered according to alignment with KIR3D (2DL4 starts two codons before the other KIR and has no exon 4). Common allele (in Caucasoid) shown first. c.a.f. Common allele frequency [top number United Kingdom (UK), bottom number Utah]. Small letters indicate sequence in intron. The four SNPs in 3DL3 distinguished the five known alleles of this locus

SNPa

c.a.f.

Forward primers

5′→3′

Reverse primers

3′→5′ complement

Product (bp)

1. 3DL3 Exon 3

Arg-His

g160a

0.93

TTCTTCTTGCTGGAGGGGC

ATCTTGGGTTTAACGAATTCAG

900

0.93

GTCTTGGGTTTAAYGAATTCAG

2. 3DL3 Exon 3

Val-Ile

a508g

0.52

AGGACCCCTTGCGCCTCA

GTTTGACATTTACCATCTATCC

1,800

0.57

GGACCCCTTGCGCCTCG

3. 3DL3 Exon 5

synonymous

a782g

0.76

CTTTGGTTCTGTCACTCACTTA

AGAGGCCGGTGAACTTAGG

1,800

0.73

CGGAGGCCGGTGAACTTAG

4. 3DL3 Exon 5

Asn-Thr

a875c

0.88

GGAGGCRGAGGCCGGTG

CCTACAGATGCTTCGGCTC

250

0.92

ACTACAGATGCTTCGGCTCT

5. 2DL4 Exon 3

Cys-Tyr

a157g

0.63

TCTTCTTGGACCAGAGTGTG

ATCGTCGTGGGTTTAACATCT

350

0.61

GTCGTCGTGGGTTTAACATC

6. 2DL4 Exon 5

Pro-Ala

c925g

0.73

ACCTACAGATGTCGAGGTTTT

GCGAGTGACCCACTGCCT

1,200

0.78

CCGAGTGACCCACTGCCT

7. 2DL4 Exon 7

synonymous

t1061c

0.5

CAGTGGCCATCATCCTCTTT

ACACAGAACAGTGAACAGGG

600

0.53

CAGTGGCCATCATCCTCTTC

 

8. 3DL2 Exon 1

synonymous

t17g

0.61

agcaccATGTCGCTCACG

CGGCCCAGCACTGTGGT

1,500

0.61

cagcaccATGTCGCTCACT

9. 3DL2 Exon 3

Leu-Val

c342g

0.79

CACCCAGCAACCCCGb

GCCCCTGCTGAAATCAGGAb

1,550

0.8

GACACCCAGCAACCCCCT

10. 3DL2 Exon 9

Thr-Met

c1196t

0.7

catcttcctccaggTATCTG

CGTACGCACAGTTGGATCAb

780

0.7

TGTACGCACAGTTGGATCACb

aThe accession numbers in dbSNP for the KIR framework SNPs are as follows: 1. 001745633, 2. 000514345, 3. 001745634, 4. 001442497, 5. 000519527, 6. 000521877, 7. 000519802, 8. 000519817, 9. 000519754, 10. 001745635

bPrimer modified or from Shilling et al. (2002a)

Table 2

KIR markers, population frequencies and primers used for their assay

Marker

Allele frequency

Forward primers

5′→3′

Reverse primers

3′→5′ complement

Product (bp)

UK

Utah

3DL1*001

0.16

0.16

TACAAAGAAGACAGAATCCACA

CCCTATCAGTTGTCAGCTCa

1,600

3DL1*002

0.22

0.17

CCATCGGTCCCATGATGCT

CGGGGAGCCCATGAACGT

2,000

3DL1*002/1502

-b

0.03

CCATCGGTCCCATGATGCT

CAGCTCCCGGAGCTCCTA

1,900

3DL1*004

0.17

0.18

CAGACACCTGCATGTTCTC

CCAACAGCGAGGtaggtg

650

3DL1*005

0.15

0.18

ACTCTTCGGTGTCACTATCG

TCCTATCAGTTGTCAGCTCCa

1,600

3DL1*006

0

0

CCAAGGCCAATTTCTCCATT

AGAGAGAAGGTTTCTCATATG

1,700

3DL1*007

0.03

0

ACCCCAGACACCTGCACG

CCAACAGCGAGGtaggtg

650

3DL1*008

0.04

0.02

AGAGGGCCGGTCCACACG

ATCATAGGTTTAACAATTTCATG

900

3DS1

0.21

0.26

tgcaccggcagcaccATGT

CGGAGGACACGTGACTCTT

1,900

2DL2*001/2c

0.17

0.16

GAGGGGGAGGCCCATGAAT

TAGGtgaggaaaccccatatct

200

2DL2*003/4c

0.18

0.2

CAGGtgaggaaaccccatatct

2DL5Ad

0.22

0.21

TCGGGGTTCACACCCACG

ACTCTCAGCCCAGCCGG

1,000

2DL5Bd

0.14

0.15

CGGGGTTCACACCCGCG

2DS4dele

0.66

0.6

CTTGTCCTGCAGCTCCATC

CGTCACAGGtgaggaaacc

210

2DS4norme

0.27

0.28

TTGTCCTGCAGCTCCCGG

220

3Dp1del

0.82

0.81

agagggagggagtgccac

CCGTTTTCATAGGCCCTGT

200

3Dp1norm

0.17

0.18

gggagaatcttctgacacgt

300

aPrimer modified or from Shilling et al. (2002a)

bOnly discriminated by segregation

c2DL2*003 not observed in either population here

dDuplicated locus. These primer combinations discriminate between them in Caucasoid populations only (unpublished observation)

e-del Deletion alleles of 2DS4 and 3DP1

KIR3DL1/S1 genotyping

Allele-specific PCR reactions were designed to detect all known alleles of 3DL1 and 3DS1 in separate reactions (Tables 1, 2). Three alleles of 3DS1, *001–*003, were not detected in initial experiments (not shown). All alleles of 3DL1—apart from *006—were detected here, and HWE was observed in both Caucasoid populations (Tables 1, 2). Genotype also correlated with flow-cytometric phenotype for all individuals (not shown). For example, 3DL1 phenotyping demonstrated that none of the 19 independently segregating *004 alleles exhibited DX9 (3DL1-specific) monoclonal antibody staining, consistent with the lack of cell-surface expression of 3DL1*004 (Pando et al. 2003).

LILRA3 polymorphism/LRC haplotypes

LILRA3 genotyping

Genotyping for seven variants of LILRA3 by allele-specific PCR was performed exactly as described (Norman et al. 2003). Two African (n=60, n=50), two South Asian (n=92, n=103), Palestinian (n=100), Thai (n=119) and UK Caucasoid (n=172) population samples were studied. All sample collections and KIR genotype profiles were as described previously (Carrington et al. 2002; Cook et al. 2003; Norman et al. 2001, 2002). All populations consisted of healthy, unrelated individuals, and institutional ethical approval was obtained. LD between KIR and LILRA3 was analysed using SNP genotypes for UK and Utah populations and the KIR2DL2/KIR2DL3 genotypes for all populations.

Statistical analysis

The LD parameters Δ and D′ were calculated for SNP pairs using the expectation-maximisation (EM) algorithm of EHPLUS (Zhao et al. 2000). D′ was used in addition to Δ, as it is less dependent on the rare allele frequency. A χ2 test was also performed to test for significant departure from the null hypothesis that pairwise LD was due to random association. Two-locus haplotype frequencies for each LILRA3KIR combination were also estimated and compared to those expected under equilibrium, using EHPLUS.

Ten-locus SNP haplotype frequencies were estimated from the UK population and Utah parents using EM, giving 80% correlation with those deduced by segregation as predicted by simulation (Fallin and Schork 2000). When each individual framework-locus three- or four-SNP haplotype frequency was estimated by compacting the ten-locus haplotypes (Single et al. 2002), there was 98–99% correlation. Haplotype frequencies for each of the individual framework loci that were estimated in this manner were identical for the UK and Utah populations.

As the Utah haplotypes were determined by segregation analysis, it was possible to regard individual variable content loci as bi-allelic during subsequent analysis, where allele 1 = locus present and allele 2 = locus absent; the common allele for each locus was determined empirically, but can also be obtained from the respective gene frequency in the UK Caucasoid group in each case (Norman et al. 2001).

Wright’s FST

Wright’s FST provides an indication of the variation in allele frequency relative to other alleles or loci amongst a group of populations (Wright 1951). Genetic drift is the random change in allele frequency over time that has not been caused by selective pressure. There is an expected distribution of allele frequencies amongst a group of populations, but some do not follow the expected distribution, and this indicates that natural selection has affected the locus (Bowcock et al. 1991; Cavalli-Sforza et al. 1994). Wright’s FST was calculated by FST=Vp/p(1−āp), where Vp = variance of allele frequency (p) across the populations, āp = average allele frequency (p) and n = number of populations. Vp=Σ(p−āp)/(n−1).

Watterson’s homozygosity

Allele frequency distributions within populations can be summarised by the sum of the squares of the frequencies (Watterson’s F-statistic, Watterson 1975). Under neutral evolution, F will fall into an expected range, dependant on the number of alleles and individuals in the population sample. Balanced selection results in an even allele frequency distribution and an observed F that is significantly lower than that expected (Watterson 1975). Observed F was compared to the expected range available from (http://allele5.biol.berkeley.edu/homozygosity/homozygosity.html).

Results

Deduced haplotypes

All Utah family members were molecularly genotyped; pedigrees were then decoded; and the parents from each family pooled for population tests (n=54). Phase was assigned for parental haplotypes (n=108) by segregation in appropriate family members. The haplotypes are described fully in Fig. 2. The only assumptions made initially were that every haplotype contained a copy of each of the framework loci and 3DP1 [there was at least one where this was not the case—h01 (Fig. 2), which lacked 3DP1, 2DL4 and 3DL/S1], the two variants of 2DS4 were alleles of a single locus (Maxwell et al. 2002) and that 2DL2 and 2DL3 were essentially alleles in Caucasoids (Trowsdale et al. 2001).
Fig. 2

Full-resolution KIR haplotypes deduced from Utah Caucasoid families and arranged according to branch tips from neighbour-joining analysis [performed to show haplotypes groupings, not to infer phylogenies, using PHYLIP (http://evolution.genetics.washington.edu/phylip.html)]. Locus order is according to Hsu et al. (2002a), except 2DS3, which may also occur in the telomeric portion of the KIR cluster (Hsu et al. 2002a). The three framework KIRs surround the regions of variable gene content. Each SNP is labelled with common allele first. Boldface indicates coding change. Del deletion, a by virtue of framework SNP pattern, denotes the 22 different gene-content haplotypes. Most were corroborated in other Caucasoid populations (Gomez-Lozano et al. 2002; Hsu et al. 2002a)

There were 22 distinct KIR gene-content haplotypes from the 108 determined. At the highest resolution analysed, there were 75 distinct KIR haplotypes from the 108 total. These high-resolution haplotypes fell into seven distinct clades (I–VII, Fig. 2). Each haplotype clade had a distinct pattern of markers, and many of the markers appeared only in subsets of the haplotype groups. Three of the haplotype clades (V–VII) were comprised of the A gene-content haplotypes, further demonstrating that gene content is just one component of KIR diversity.

Several markers were chosen based on assumptions of their functional importance, so that they were more likely to describe useful associations than those randomly selected (Collins et al. 1997). The particular markers were: (1) 2DL4 SNP t1061, which has been shown to indicate the presence of truncated 2DL4 in another Caucasoid population as it is less than 50 bp upstream from and in complete LD with this single base insertion/deletion site (Witt et al. 2002). (2) 3DL1 allele *004 is not expressed at the cell surface, *005, *006 and *007 have low expression, and *001, *1502 and *008 have higher expression (Pando et al. 2003; Vilches and Parham 2002). Also, 3DS1 is an allele of 3DL1 (Trowsdale et al. 2001), but with opposite function. Here, KIRs 3DL1 and 3DS1 did appear to be alleles (Σallele frequency ~1, Table 2), although two haplotypes contained both (h18, h27; Fig. 2), and one was found with neither (h01, Fig. 2). One of the haplotypes with both 3DS1 and 3DL1 (h18) is apparently identical to a haplotype that has been described recently in greater detail and also contains two copies of 2DL4 (Martin et al. 2003; Williams et al. 2003). Other haplotypes presented here may also contain more than one copy of any KIR in addition to those shown. (3) 2DL5, which has been duplicated and one form, 2DL5B*002 (2DL5.2), is not expressed due to promoter region polymorphism (Vilches and Parham 2002). KIR haplotypes contained none, one or both form(s) of 2DL5 (Fig. 2), as previously observed (Gomez-Lozano et al. 2002). (4) The 22-bp deletion in 2DS4, which results in a truncated molecule with only one Ig domain (Hsu et al. 2002b; Maxwell et al. 2002).

KIRs 3DL1*001, *004 and*005 always segregated with the SNP marker for truncated 2DL4 (t1061) and also segregated with 2DS4del in 51/53 (96%) cases (Table 3). Hence, truncated 2DL4, 3DL1-null (*004) and truncated 2DS4 were predominantly cis-encoded. This low-expression module often formed a component of the A haplotype, which already has the fewest KIR genes. KIRs 3DL1*002 and *1502 (high expression) always segregated with alleles for complete 2DL4 and 2DS4. KIR3DS1 was usually on the same haplotype as complete 2DL4 and absent 2DS4.
Table 3

Linkage disequilibrium values of phase-known 2DL4 and 3DL2 SNP haplotypes for each allele of 3DL/S1. SNP common allele is shown first

Marker

KIR3DL/S1 allele

*001

*002

*1502

*004

*005

*008

3DS1

2DL4-a157g

0.85

0.87

(+)

1

−0.73

(+)

−0.82

2DL4-c925g

(+)

−0.75

(−)

1

(+)

1

1

2DL4-t1061c

1

−1

(−)

1

1

(−)

−0.86

2DS4-norm

-

0.87

1

-

-

-

-

2DS4-del

0.62

-

-

1

0.76

-

-

2DS4-neg

-

-

-

-

-

-

0.89

3DL2-t017g

−1

0.87

(+)

−1

0.86

(−)

0.82

3DL2-c341g

(+)

1

(+)

−0.87

0.75

(+)

0.65

3DL2-c1196t

1

1

(+)

0.64

(+)

(+)

−0.9

All values are D′ with common allele (uncorrected P<0.0001 unless <0.001, <0.05). When linkage disequilibrium was not significant (+) and (−) are used to indicate the most frequent SNP haplotype associated with each 3DL1 allele. Positive D′ shows that the 3DL1 allele segregates with SNP common allele, e.g. the most frequent haplotype was gcc-3DS1-2DS4neg-tct (haplotype allele frequency = 25%)

The emerging pattern of LD in the human genome reveals a metameric chromosome structure, where strong LD is localised in blocks of varying size, joined by shorter segments where crossover has been more likely (Daly et al. 2001; Gabriel et al. 2002; Jeffreys and Neumann 2002). We observed a breakdown of LD centromeric to 2DL4, which implies that this marks the boundary of two LD blocks and a recombination hot spot (Fig. 3). A sliding haplotype-window analysis (Fig. 3a) and the LD profile (Fig. 3b) showed that much of the recombination that has generated KIR haplotype diversity occurs in the region centromeric from 2DL4 and close to the pseudogene 3DP1. There was complete LD amongst certain groups of centromeric KIR cluster genes and amongst groups in the telomeric portion, but there was little obvious correlation between any centromeric KIR and 2DL4 (Fig. 3b). Also, the LD between adjacent KIR framework SNP pairs showed a sharp decline from 3DL3 to 2DL4 (Fig. 3c). This analysis shows the KIR cluster being comprised of two main LD blocks which split the haplotypes into two ‘halves’, consistent with the proposal that many of the various gene-content haplotypes have arisen by rearrangements involving smaller, component haplotypes (Hsu et al. 2002a; Yawata et al. 2002b). The 3DL3 SNPs were evenly distributed amongst the haplotype groupings (Fig. 2), but there was strong LD amongst certain pairs of SNPs within 3DL3 (Fig. 3), and this may indicate the presence of a further less-defined LD block, with a recombination zone telomerically from 3DL3 (shown as the smaller block in Fig 3b). The major recombination hotspot occurs in the interval between 3DP1 and 2DL4. This 13-kb segment between the two loci may be the longest non-coding section within the otherwise gene-dense KIR cluster and remained conserved between the haplotypes so far sequenced (Trowsdale et al. 2001). This conserved sequence may increase the likelihood of crossover by homologous recombination (Trowsdale et al. 2001). LD is most predictable within blocks although recombination is still possible, and LD can exist between markers in neighbouring blocks (Trachtenberg et al. 1995), as observed here (Fig. 3b).
Fig. 3a–c

KIR haplotypes show recombination hot spot centromeric to 2DL4. Using phase-deduced haplotypes generated from Utah Caucasoid families. SNPs 1–10 shown next to their respective loci (e.g. 3DL3-1). a Sliding-window analysis of five locus haplotypes. The number of distinct haplotypes was counted from each consecutive group of five markers. Genotype resolution of respective markers was reduced so that all were bi-allelic with rare allele frequency >0.1. Thus, the maximum number of haplotypes for each window is 6; when >6 this most likely indicates that recombination has occurred, although reversion or repeat mutation may not be ruled out (Clark et al. 1998). The analysis does not continue beyond 2DS4 because this marks the last complete window. b Statistically significant linkage disequilibrium (LD) values are concentrated in two distinct blocks. Lower triangle LD blocks illustrated using HaploBlockFinder (Zhang and Jin 2003); LD values are as indicated by colour scheme. Upper triangle Statistical significance was assessed using Fisher’s exact test from two-locus haplotypes determined in phase by segregation. The exact test does not indicate whether LD between common alleles is positive or negative; this can be derived from the haplotype structures (Fig. 2). + Indicates loci that were tested only for presence/absence. c LD between consecutive SNPs. Three adjacent 3DL3 SNP pairs had |D′ |=1, indicating either two or three haplotypes were present for each of these pairs; thus recombination may not have occurred. However, SNP 3DL3-4 was in equilibrium with SNP 2DL4-5. The values shown were obtained using the Utah parents’ genotypes (a population sample of n=54); the United Kingdom (UK) Caucasoid (n=100) graph was identical

KIR SNP genotypes

There was a high heterozygosity of KIR SNP genotypes (Fig. 2). The mean SNP heterozygosity (0.36 Utah, 0.38 UK) was higher than that observed for the peptide-fragment contacting codons of HLA (0.3, Hedrick et al. 1991). All subjects from both populations were heterozygous for at least one KIR region marker and only two unrelated (UK) individuals had identical genotypes (not shown). When all full-resolution KIR haplotypes (see Fig. 2) were pooled and the population re-sampled under the assumption of random mating, only 1% of pairs matched. Thus, only one UK individual (≈randomly selected Utah haplotype pair) was homozygous for all KIR markers.

KIR SNP haplotypes

The two Caucasoid populations displayed a similar profile of LD values amongst SNPs in the LRC region. The statistical association calculated between each pair of SNP markers showed that LD was not always related to the physical distance (in base pairs) between markers (Fig. 4). For instance, there was maximum (|D′ |=1) between markers <100 bp as well as those over 100 kbp apart. However, the statistically significant LD values (uncorrected P<10−4) were restricted to the shorter intervals between SNP markers. Examples of strong LD were observed within and between the framework loci (Fig. 3b), the most consistent associations between 2DL4 and 3DL2. SNP 6 (2DL4-c925g) had strong LD with seven SNP markers and significant LD with seven variable-content loci (Fig. 3b). SNP 7 (2DL4-t1061c) had strong LD with only three other SNPs but significant LD with 12 variable-content loci (Fig. 3b). There was a decline of LD over the 450 kb between KIR and LILRA3 in both Caucasoid populations (Fig. 4).
Fig. 4a, b

LD versus distance (base pairs) for all combinations of LRC SNPs in two Caucasoid populations. a UK (n=100); b Utah (n=54). White circles correspond to the magnitude of D′ (|D′ |), black circles when P<10−4. The four LILRA3 coding-change SNPs were in complete LD within LILRA3 (Norman et al. 2003) and so were pooled for this analysis

KIR3DL1 allele frequencies

The allele frequencies observed in both the Utah and UK Caucasoid populations are presented in Table 2. Using this PCR-SSP genotyping scheme, seven alleles were detected in each of the two groups. The allele frequencies were evenly distributed between the two groups, and the common alleles (*001, *002, *003, *004, *005 and 3DS1) were evenly dispersed within each group. To test whether the within-population frequency distribution was likely to have arisen under neutral evolution, we applied Watterson’s F-statistic. Using the 3DL1 allele frequencies that are shown in Table 2, F (Fobs) for the UK population was 0.176, which was significantly lower than the expected (Fexp=0.421, P=0.001). Fobsfor the Utah population was 0.156, which was also significantly lower than the Fexp of 0.38 (P = 0.0001). Thus, the 3DL1 alleles were more evenly distributed in these two populations than would be expected under neutral evolution (Watterson 1975).

LILRA3 allele frequencies and LRC haplotypes

As there was little evidence for significant LD between KIR and LILRA3 (Figs. 3, 4), several more populations were investigated. Six more population samples were genotyped using allele-specific oligonucleotide primer combinations for the proposed unique protein coding variants of LILRA3. Observed and expected two-locus haplotype frequencies between LILRA3 and KIR were compared for all populations, but no significant LD was observed (not shown). Of interest, there were significant LILRA3 allele frequency variations across the populations studied (Table 4). LILRA3*003 was the most common allele in all populations apart from the two African groups, where *006 was the most frequent. LILRA3 deletion was present in all populations studied and most common in Caucasoid and Thai individuals, but did not display the highest inter-population variation. HWE was observed for all alleles in each population (not shown). Wright’s FST calculations indicated a marked inter-population variation for LILRA3 alleles. This variation was especially apparent for the frequently occurring *003 and *006, but the less common alleles also showed statistically significant variation amongst the study populations (Table 4).
Table 4

Leukocyte immunoglobulin-like receptor A3 (LILRA3) allele frequencies for seven populations representative of Caucasoid, African and Asian ethnicities. Allele frequencies from genotype (2n)

Population

Number of individuals

*001

*003

*005del

*006

*007

*008

*009

African 1 (Trinidad)

60

0

0.15

0.12

0.47

0.09

0.12

0.04

African 2 (UK)

50

0.04

0.11

0.06

0.71

0

0.07

0.01

Palestinian

100

0.17

0.53

0.1

0.08

0

0.1

0.01

Caucasoid

172

0.14

0.47

0.26

0.04

0

0.08

0.002

Thai

119

0.01

0.39

0.21

0.32

0

0

0.08

South Asian 1 (Trinidad)

103

0.03

0.53

0.08

0.21

0.01

0.12

0.01

South Asian 2 (Pakistan)

92

0.03

0.68

0.1

0.05

0

0.13

0.01

P<a

-

10−6

10−10

10−4

10−10

10−5

10−8

10−5

Wright’s FSTb

-

0.08

0.18

0.05

0.32

0.08

0.02

0.04

a(Corrected for multiple tests) for highest versus lowest allele frequency calculated using χ2 proportions test or, for small counts, Fisher’s exact test.

bWright’s FST for KIR loci using the same population samples: KIR 0.02 (2DL2)—0.07 (2DS4) from Norman et al. (2002)

Discussion

We have described 108 KIR/LILR haplotypes that were deduced by segregation in families sampled from a Caucasoid population, using a composite of high-resolution, allele-specific and SNP markers. The statistical associations amongst the markers analysed—and thus the underlying haplotypes—were very similar in a separate Caucasoid sample that was analysed without knowledge of linkage phase. Haplotypes can be informative genetic indicators of disease associations or diagnosis and coding-region SNPs in particular will provide useful and biologically relevant markers (Collins et al. 1997). Despite the complex nature of the KIR cluster, framework SNPs revealed sufficient structure for subsequent detection of associations with disease or phenotype. Approximately 50% of the KIR SNP pairs had |D′ | above 0.3 (Figs. 3, 4), which is likely to be the practical threshold for association studies (Kruglyak 1999). The strategy of choosing three SNPs per locus meant that most were in LD with at least one other marker, and that the different SNP haplotypes were characterised by distinct patterns of markers associated with them. Examples of strong LD were observed within and between the framework loci (Fig. 3b). Most markers would thus have useful LD with several others, implying that further variation will be detected by virtue of association. Moreover, various characteristics such as the lack of 2DL4, 2DS4 and 3DL1 expression appear to be linked as they often co-segregate, so that the major KIR region haplotype blocks may be viewed as the unit of functional variability.

Some human KIR, such as 2DL4, 2DS4 and 3DL1, have alleles that are severely mutated or not expressed. Similar 2DL4 mutations are found in several other species and both known orang-utan alleles are inactivated or truncated (Guethlein et al. 2002). Normal and truncated 2DL4-encoding haplotypes were observed here in equal proportions (Fig. 2). One human haplotype lacked this locus (h01, Fig. 2), and individuals can lack 2DL4 altogether (Norman et al. 2002). There was also a high prevalence of inactivated or absent 2DS4 and normally expressed alleles of 3DL1 had a combined allele frequency of only 42% (Table 2). Many of the most frequent A haplotypes encode a truncated 2DS4 variant (Hsu et al. 2002a; Maxwell et al. 2002); this analysis has shown that these same haplotypes and thus ~10% apparently healthy Caucasoids concurrently have reduced expression of 3DL1 and a truncated 2DL4 (Table 3).

KIR interact with HLA to control immunologically active cells, but when absent their functions are compensated by other molecules (Vilches and Parham 2002). Despite this redundancy, observations presented here and elsewhere suggest that KIR have been subject to strong natural selection forces during their evolution. KIR evolution may be driven by the need to recognise HLA (Vilches and Parham 2002); alternatively some KIR may recognise markers of cellular invasion, thus driving their polymorphism at a rate equivalent to that of HLA (Martin et al. 2002b). Many of the HLA loci have been under positive balancing selection, which enhances the number of non-synonymous substitutions that reach polymorphic frequency and maintains heterozygosity. Other characteristics of balanced selection include locus inactivation, strong LD, even distribution of alleles and prevalence of coding-change substitutions, indicating that certain alleles have responded to specific selecting events (Bowcock et al. 1991; Fay et al. 2001; Hedrick et al. 1991; Hughes and Yeager 1998; Slatkin 2000). HLA heterozygosity enables a wide range of peptides to be presented and a more efficient T-cell repertoire to develop within the individual (Hughes and Yeager 1998; Messaoudi et al. 2002). Accordingly, there is a high level of KIR polymorphism (Vilches and Parham 2002), strong LD amongst the loci (e.g. see Shilling et al. 2002b; Figs. 2, 3b) and a high ratio of coding-change SNPs in the HLA-contacting Ig domains (Yawata et al. 2002a). In our analysis both Watterson’s F and Wright’s FST—which are independent statistical tests for distribution within and between populations, respectively—indicated that the KIR frequencies we observe today are not likely to have arisen during neutral evolution. The various alleles of 3DL1 were distributed in the two Caucasoid populations groups such that maximum heterozygosity was obtained in both. Watterson’s F-test, which may not always yield a statistically significant result—even for classical HLA loci (Bugawan et al. 2000)—produced a significant result for each of two independently sampled Caucasoid populations. By contrast, in these same populations, the LILRA3 frequencies were unevenly distributed. Wright’s FST compared the KIR locus frequencies between populations; when there are many alleles at similar frequencies, the probability of finding a homozygous genotype in a population is similar to that of obtaining a homozygote by random pairing between populations and the FST is low. Classical HLA have many more alleles than the number expected under neutral evolution and high heterozygosity, thus low FST (Bowcock et al. 1991; Fay et al. 2001; Hedrick et al. 1991). The FST values for KIR were similar or lower than HLA from a similar selection of population groups (Bowcock et al. 1991). FST values were also calculated using the HLA genotypes of our population samples (data not shown), with very similar results to those previously detailed (Bowcock et al. 1991; Cavalli-Sforza et al. 1994), thus supporting our findings. Although this part of our analysis was confined to the KIR locus frequencies in the different ethnic populations, and more information will be gleaned when it is possible to obtain allele frequencies from these KIR in the population samples, low FST values provide further evidence for balanced selection acting on KIR.

Strong LD and common haplotypes that are distinguishable from the highly heterogeneous background are hallmarks of positive balanced selection (Sabeti et al. 2002; Slatkin 2000). The complexity of KIR locus SNP arrays, coupled with the presence of distinct haplotypes (Table 3; Fig. 2) resembled that observed for MHC (e.g. Ajioka et al. 1997; Graham et al. 2002), which has a turnover of selectively advantageous alleles and haplotypes in response to varying and rapidly evolving pathogen challenge (Nei et al. 1997). If an allele has a strong selective advantage, then it will increase in frequency but have low haplotype diversity (Fay et al. 2001). The haplotypes then disperse by recombination and mutation, contributing to the background diversity we observe. For example, 3DS1 occurred predominantly with a single 2DL4/3DL2 SNP array (Table 3). There was less variation of SNP markers on full-length 3DS1 haplotypes than on the most frequent A haplotypes (Fig. 2), implying 3DS1 or another locus on this very distinct haplotype has had a selection advantage (see Hamblin et al. 2002). KIR3DL1*004, which is a non-expressed allele (Pando et al. 2003), was also associated with a distinct SNP haplotype (Table 3) and may also have conferred a natural advantage.

LILRA3

Wright’s FST calculations indicated a marked inter-population variation for all LILRA3 alleles (Table 4). FST distributions are not necessarily comparable for different groups of populations, but analysis of a similar group of populations (two African, two Asian and two European Caucasoid) for 100 polymorphic variations (Bowcock et al. 1991) showed very few genes with inter-population diversity of the magnitude observed here. For example, the value of 0.32 for LILRA3*006 placed this in the 95th percentile of observed FST values, showing an uneven allele frequency distribution that is characteristic of disruptive selection (Bowcock et al. 1991). The only marker with consistently higher FST values than LILRA3 was Duffy antigen, which can confer resistance to Plasmodium vivax infection (Cavalli-Sforza et al. 1994). Disruptive selection implies that a particular allele has an advantage or a disadvantage in one or more population group because that population is subjected to an environmental circumstance that is not shared by all the others. Although one must never rule out genetic drift as the cause of allele frequency variation between populations (Cavalli-Sforza et al. 1994), and we possess only one line of evidence for this locus, our results suggest that mutations at LILRA3 or a nearby locus confer an advantage in some populations, possibly in response to a single pathogen.

LRC haplotypes

Despite the recombination hot spot, there were several haplotypes that remained identical throughout the KIR cluster, but there was a decline of LD over the 450 kb between KIR and LILRA3. This contrasts with the MHC region, which characteristically displays strong LD over longer distances. For the average chromosomal region LD declines, albeit irregularly, with distance until ~500 kb, where there is no difference from unlinked loci (Abecasis et al. 2001; Dunning et al. 2000). Thus, it appears that the interval between LILRA3 and KIR displays normal LD decay, and there is no statistically significant long-range association (Fig. 4).

Conclusions

The evolutionary path of KIR has been diverging from that of LILR (Trowsdale et al. 2001; Vilches and Parham 2002), and the analysis presented here has shown in modern human populations that this may have been due to different selective pressures acting on each. LILRA3 has apparently been under disruptive selective pressure, whilst KIR has evolved to maximise heterozygosity, contributing to a lack of LD between closely linked loci and few common haplotypes that extend throughout the LRC. The LD between loci under different selective pressures is expected to be zero (Nei 1987). Gene duplication events in the MHC can result in the loss of polymorphism for the duplicated and adjacent loci (Parham 1994). The lack of LD between LILR and KIR may indicate evolution of a system generating diversity in gene number for a set of loci, while simultaneously maintaining the degree of allelic diversity for nearby genes.

Acknowledgements

Many thanks to Sheila Fisher, Jerry Lanchbury and to the members of the Parham Laboratory for advice and control DNA samples. This investigation was supported by a Public Health Services research grant no. MO1-RR00064 from the National Center for Research Resources to the Hunstman General Clinical Research Center at the University of Utah. It was also supported by generous gifts from the W.M. Keck Foundation and the George S. Delores Doré Eccles Foundation (Utah) and a grant from the Howard Ostins trust fund (UK). We would like to extend our sincere thanks to all participants and personnel. The work described was performed in accordance with all appropriate regulations.

Copyright information

© Springer-Verlag 2004