A combined DPA1∼DPB1 amino acid epitope is the primary unit of selection on the HLA-DP heterodimer

Here, we present results for DPA1 and DPB1 four-digit allele-level typing in a large (n = 5,944) sample of unrelated European American stem cell donors previously characterized for other class I and class II loci. Examination of genetic data for both chains of the DP heterodimer in the largest cohort to date, at the amino acid epitope, allele, genotype, and haplotype level, allows new insights into the functional units of selection and association for the DP heterodimer. The data in this study suggest that for the DPA1-DPB1 heterodimer, the unit of selection is the combined amino acid epitope contributed by both the DPA1 and DPB1 genes, rather than the allele, and that patterns of LD are driven primarily by dimer stability and conformation of the P1 pocket. This may help explain the differential pattern of allele frequency distribution observed for this locus relative to the other class II loci. These findings further support the notion that allele-level associations in disease and transplantation may not be the most important unit of analysis, and that they should be considered instead in the molecular context. Electronic supplementary material The online version of this article (doi:10.1007/s00251-012-0615-3) contains supplementary material, which is available to authorized users.


Introduction
The human leukocyte antigen (HLA) complex on chromosome 6 is the most polymorphic region of the human genome, and while doubly polymorphic heterodimeric molecules are relatively rare in human biology, this is a defining feature of the HLA class II gene products. Within the class II region, the DRB1 locus is recognized as significantly more polymorphic relative to the other beta chain genes, with >1,000 alleles recognized to date, and is nearly 150-fold more diverse than the DRA locus, where only seven alleles have been identified (http://www.ebi.ac.uk/imgt/hla/stats.html). In contrast, the HLA-DQ and HLA-DP systems are characterized by having more moderate variation overall relative to HLA-DR, but more diversity in genes for the alpha chains, resulting in less imbalance between levels of alpha and beta diversity; the ratio of beta/alpha known alleles is approximately 4:1 for these genes. In the DQ system, alpha chain diversity, coupled with relatively even allele frequency distributions for both DQA1 and DQB1 in most human populations (Fernandez-Vina et al. 1991;Solberg et al. 2008;Begovich et al. 1992;Slatkin 2000), potentially increases polymorphism at the heterodimer level. At the same time, the DQ system is characterized by near complete linkage disequilibrium (LD), and evidence suggests that the LD is driven in part by structurally permissive and nonpermissive pairings of the alpha and beta chains, which may restrict overall heterodimer polymorphism. Much previous work (Begovich et al. 1992;Bugawan et al. 2000;Fernandez-Vina et al. 1991;Hollenbach et al. 2001;Klitz et al. 2003) has demonstrated that certain combinations of DQA1 and DQB1 alleles are almost never seen on the same haplotype, and there is evidence to suggest that these pairings do not produce stable cell surface heterodimers (Kwok et al. 1993;Kwok and Nepom 1989). In contrast to the balanced polymorphism observed for DQA1 and DQB1, only a few alleles predominate in most human populations for both DPA1 and DPB1 (Gendzekhadze et al. 2004;Steiner et al. 2000) (Begovich et al. 2001;Pérez-Miranda et al. 2004;Solberg et al. 2008), resulting in significantly less potential heterodimeric diversity within a given population. However, while much is known about alpha-beta haplotypic associations in the HLA-DQ system, because DPA1 genotyping is infrequently performed, there have been only limited studies with relatively small sample sizes describing DPA1∼DPB1 haplotypes (Begovich et al. 2001;Gendzekhadze et al. 2004).
Evidence for lower levels of cell surface expression of the DP molecule relative to other classical HLA molecules (Edwards et al. 1986;Guardiola and Maffei 1993) has historically led to the notion that DP may be less important than other HLA in human health. Furthermore, a lack of evidence for balancing selection, a hallmark of the other class II loci, has been interpreted as an indication that DP may not play an important role in protection from pathogens in humans. In both solid organ and hematopoietic stem cell transplantation (HSCT), standard protocols do not call for matching at DP, and DP typing is not routinely performed. However, recent data suggest that DP match status may play an important role in HSCT outcome (Crocchiolo et al. 2009;Fleischhauer et al. 2006;Gallardo et al. 2004;Shaw et al. 2003Shaw et al. , 2006Vrana et al. 2006;Zino et al. 2004Zino et al. , 2007Fleischhauer et al. 2012), although the nature and direction of the impact are still the subject of debate. DP molecules can serve as targets of alloreactivity with clinical consequences in HSCT, demonstrated by data showing association of HLA-DP mismatching with relapse, GvHD, rejection and nonrelapse mortality after unrelated HSCT, as well as data suggesting that anti-DP donor-specific antibodies increase the risk of graft failure (Ciurea et al. 2011;Crocchiolo et al. 2009;Fleischhauer et al. 2006;Petersdorf et al. 2001;Shaw et al. 2007;Spellman et al. 2010;Zino et al. 2004;Fleischhauer et al. 2012). Likewise, anti-DP antibodies appear to play an important role in kidney transplant outcomes (Singh et al. 2010;Thaunat et al. 2009). In addition, numerous studies suggest that DPB1 is associated with predisposition to infectious disease such as hepatitis B (Howell and Visvanathan 2009;Kamatani et al. 2009), autoimmune disorders such as multiple sclerosis (Begovich et al. 1990;Odum et al. 1988) and juvenile idiopathic arthritis (Hollenbach et al. 2010), and leukemia (Taylor et al. 2002). The very strong evidence for a role of DPB1 in development of chronic beryllium disease (CBD; Amicosante et al. 2001;Fontenot et al. 2000;Lombardi et al. 2001) suggests that despite low levels of cell surface expression, antigen presentation by the DP molecule is capable of stimulating a robust and clinically significant immune response.
Here, we present results for DPA1 and DPB1 four-digit allele-level typing in a large (n05,944) sample of unrelated European American stem cell donors previously characterized for other class I and class II loci (Maiers et al. 2007). Examination of genetic data for both chains of the DP heterodimer in the largest cohort to date, at the amino acid epitope, allele, genotype, and haplotype level, allows new insights into the functional units of selection and association for the DP heterodimer.

Subjects
The study cohort consisted of 5,944 (self-identified) European Caucasian individuals who donated hematopoietic stem cells for unrelated transplants facilitated by the National Marrow Donor Program (NMDP) during the years 1988 to 2003. Primary granulocyte and mononuclear cell preparations and transformed B cell lines from donors were distributed from the NMDP Research Sample Repository (Blood Systems Research Institute, San Francisco, CA) to nine laboratories for DNA preparation and HLA genotyping.
Genotyping Samples were genotyped at allele level for DPA1 and DPB1 (results from this same cohort for other loci: HLA-A, HLA-B, HLA-C, HLA-DRB1, and HLA-DQB1 were reported previously; Klitz et al. 2003;Maiers et al. 2007). Sequence-specific oligonucleotide typing assays were performed as previously described (Williams et al. 2008) using reagents from the local laboratory, the 11th and 12th international workshops (Bignon and Fernandez-Vina 1997), and commercial vendors (Steiner et al. 2000). Sequence-based typing for DPA1 was based on a 1,366-nucleotide PCR product including exons 2 through 4 as described previously (Rozemuller et al. 1995). DPB1 generic sequencing was based on a 574-nucleotide PCR product including exon 2 prepared from genomic DNA (Versluis et al. 1993) with heterozygous ambiguity resolved by allelespecific amplification or PCR-SSP.
The 5,944 samples were genotyped by nine different laboratories between 1994 and 2004. In order to maintain consistent reporting during the course of this project, results were interpreted to only consider DPA1 and DPB1 alleles identified in the 1994 HLA Nomenclature Report (Bodmer et al. 1994). One thousand six hundred seventy-one (28.1 %) individuals (3,342 chromosomes) were genotyped by two different laboratories for quality control; the remaining 4,273 (71.9 %) individuals were genotyped by a single laboratory.

Resolution of discrepant results and ambiguous genotypes
The results were transferred from the nine laboratories via electronic files to the NMDP data center for interlaboratory comparison and also comparison with any typing previously reported by the transplant center (when available). Sixtyeight individuals (1.1 %) had previously reported results for DPB1, and two (0.03 %) had previously reported results for DPA1. A total of 20 out of 3,342 (0.6 %) discrepancies for DPB1 and 9 out of 3,342 (0.26 %) for DPA1 were identified and subsequently resolved. Etiology and resolution of discrepancies for this cohort generally followed the categories reported for a largely overlapping subset of N02,578 donors (Williams et al. 2008).

Definition of serological and immunogenic epitopes of DPB1
Cano and Fernandez-Vina (2009) described two sets dimorphic amino acid epitopes at positions 56 and 85-87 that together accounted for the majority of DP serological reactivity observed in a sample of solid organ transplant patients. The first sequence dimorphism is found at DPB1 amino acid position 56, with either Ala (A) or Glu (E) at this site. The second variable region corresponds to amino acid positions 85-87 in the hypervariable region (HVR) "F," and these amino acids are in complete LD; most DPB1 alleles will have either the EAV (Glu-Ala-Val) or GPM (Gly-Pro-Met) motif in these positions. Combined, these two sequence dimorphisms were found to yield four serological specificities, defined as DP1 (56A; 85-87 EAV), DP2 (56E; 85-75 GPM), DP3 (56E; 85-87 EAV), and DP4 (56A; 85-87 GPM). In addition to standard analysis at the DPB1 allele level, analysis was performed individually for these amino acid motifs, as well as for the four serological epitopes.
An alternative epitope, recognized by T cells, was defined in Zino et al. (2004Zino et al. ( , 2007 and Fleischhauer et al. (2012) on the basis of alloreactive T cell cross-reactivity patterns and is therefore referred to as T cell epitope (TCE). The TCE has so far not been mapped to defined structural amino acid residues, but was surmised to impact T cell alloreactivity via variable peptide presentation and shown to determine clinically nonpermissive mismatches for DPB1 in unrelated HSCT (Zino et al. 2004(Zino et al. , 2007. The TCE is shared by defined subsets of DPB1 alleles, and allows alleles at this locus to be assigned to three categories of immunogenicity (highly immunogenic group 1 including DPB1*09:01, 10:01, 17:01; intermediately immunogenic group 2 including DPB1*03:01, 14:01, 45:01, 86:01, 104:01; and poorly immunogenic group 3 including most other alleles; Supplemental Table 2). In the present study, DPB1 alleles were categorized according to the TCE model outlined in Zino et al. (2004Zino et al. ( , 2007 and analyzed on that basis.

Hardy-Weinberg equilibrium
Fit of the data to Hardy-Weinberg expectations was assessed using both an exact test (Guo and Thompson 1992) and a standard goodness-of-fit (chi-squared) test implemented in the PyPop software package (Lancaster et al. 2003).

Homozygosity statistic
The homozygosity F statistics of Watterson (1978), calculated as the expected proportion of homozygotes under Hardy-Weinberg, was used as a measure of the allele and where p i is the frequency of the ith allele at a locus. The test is based on the observed number of alleles (k) at a locus and sample size (2n). The homozygosity test was applied using the exact test described by Slatkin (1994) and implemented in the PyPop software package (Lancaster et al. 2003).

Haplotype and disequilibrium estimation
Haplotype estimation was accomplished using an expectation-maximization algorithm which assigns population-level haplotype frequencies using simultaneous maximum likelihood estimation of n locus haplotype frequencies. Haplotypes were estimated for DPA1 and DPB1 alleles, as well as for DPA1 alleles and DPB1 amino acid and T cell epitopes. A pairwise linkage disequilibrium statistic was calculated for each allele∼allele or allele∼epitope haplotype: where x ij is the estimated haplotype frequency and p i and q j are the ith and jth allele frequencies at the two loci. To account for differing allele or epitope frequencies at the loci, a normalized disequilibrium value was used: where D max is the lesser of p i q j and (1−p i )(1−q j ), when D ij is <0 and p i (1−q j ) and q j (1−p i ), when D ij is >0.
A global disequilibrium statistic was also used to summarize disequilibrium at all possible haplotypes for two loci (Klitz et al. 1995): where p i and q j are the observed allele frequencies at each of the two loci having k and l alleles, respectively. A normalized W was calculated to address differing numbers of alleles at the different loci: where k and l are the number of alleles at two loci (Cohen et al. 1988). The values of W n fall between 0 and 1, and is identical to Cramer's V (Cramer 1946). Haplotype estimation was accomplished using the "haplo.em" function in the "haplo.stats" package (Sinnwell and Schaid 2009) for the R language for statistical computing (R Core Development Team 2009), version 2.9.2. All LD values were computed using the "ldkl" function in the "gap" package (Zhao and Tan 2009) for R. Clustered heatmaps of LD values were accomplished using the "heatmap" function in the base "stats" package for R, which utilizes a hierarchical similarity clustering procedure.

DPA1 and DPB1 allele frequency distributions
Genotypic distributions for both DPA1 and DPB1 do not differ significantly from expectations under Hardy-Weinberg equilibrium. Allele frequencies for DPA1 are given in Table 1. In this population, two of the nine DPA1 alleles observed in the study cohort account for >95 % of the variation at this locus, with DPA1*01:03 f 00.819 and DPA1*02:01 f 00.140. Of the remaining alleles, only DPA1*02:02 is detected at a frequency greater than 1 %, and three of the nine alleles observed are found only once among the 11,888 chromosomes sampled here.
As expected, more diversity is observed for DPB1 (Table 2), where 33 distinct DPB1 alleles are seen in this cohort, but a single allele predominates and a handful of other alleles are present at moderate frequencies. Observations are consistent with those in numerous other European Caucasian populations, where DPB1*04:01 is the single most common allele (f00.439). Three remaining Similar to the results for DPA1, a little less than one third (9 of 33) of the alleles are observed only as singleton copies.

DPB1 serological epitopes
The frequencies in the study population of the amino acid epitopes at positions 56 and positions 85-87 are shown in Table 3. Position 56 is marked by extremely balanced frequencies of 0.576 for alanine at this position and 0.424 for glutamic acid. The amino acids within the second motif examined here, at positions 85-87 in the HVR "F," are in complete LD, with one of two epitopes in all DPB1 alleles in this sample: either EAV (f00.30) or GPM (f00.70). Together, these two sequence dimorphisms correspond to four serological specificities: DP1, DP2, DP3, and DP4; the frequencies of these are given in Table 4.

DPB1 TCE
When the data for this cohort are examined with respect to the TCE immunogenicity groups defined by Zino et al. (2004Zino et al. ( , 2007, it is apparent that the immunogenic TCE group 1 and 2 alleles together account for only 15.2 % of observed alleles (Table 5). TCE group 1 alleles all bear the "EAV" motif in positions 85-87, and are all in LD with DPA1*02:01. However, neither the 85-87 EAV nor the presence of DPA1*02:01 was specific for alleles from TCE group 1 as both are found also in some alleles from TCE groups 2 and 3 (Table 5 and Supplemental Table 2).

Homozygosity statistic
Calculation of Watterson's homozygosity statistic (F) for both DPA1 and DPB1 reveals that the allele frequency distributions for both loci do not differ significantly from expectations under a neutral model (Table 6). In contrast, when the frequency distributions for DPB1 are analyzed with respect to the four serological specificities, DP1-DP4 (Cano and Fernandez-Vina 2009) are significantly more even than expected under neutrality (p<0.005), suggesting evidence for balancing selection for these specificities ( Table 6). The frequency distributions of DPB1 alleles in the three TCE groups, on the other hand, do not depart significantly from neutral expectations.
These results for LD between DPA1 alleles and DPB1 amino acid motifs stand in stark contrast to the patterns of LD observed between amino acid motifs within DPB1.

Discussion
The high levels of linkage disequilibrium between DPA1 and DPB1 suggest the possibility of nonpermissive combinations for the heterodimer, similar to that suggested for the DQ molecule. It has been shown that for the heterodimer encoded by DQA1-DQB1, certain alpha-beta combinations are unstable at the cell surface, and these have been associated with the patterns of LD for these genes. The patterns of LD observed here are consistent with the notion that particular combinations of DP alpha and beta chains may be structurally impermissible. In the DP heterodimer, positions 85-87 are thought to be important primarily in interaction with the alpha chain, as well as participating in the P1 pocket (Diaz et al. 2003). This is analogous to the more well-characterized DRB1 protein structure: DPB1 position 84, in LD with positions 85-87, corresponds to DRB1 position 86, which is thought to contribute to both dimer stability (Verreck et al. 1993) and the position of bound peptide in the P1 pocket, impacting the MHC-peptide conformation (Wu and Gorski 1997).
While significantly less polymorphic than DRB1, the patterns of LD and amino acid variation for the DPB1 locus lend further support to the importance of the P1 pocket in driving DPA1∼DPB1 LD. In contrast to DPB1, DPA1 polymorphism is extremely limited and restricted to only a handful of amino acid sites. Examination of the amino acid sequences for the DPA1 alleles reveals that a single amino acid polymorphism at position 31 [methionine (M) or glutamine (Q)] subdivides the alleles at this locus along the lines of the patterns of LD (Table 1). Position 31, like positions 85-87 in DPB1, participates in the P1 pocket. Figure 2 shows the crystal structure for DP2 (Dai et al. 2010) with position 31 on DPA1 and positions 84-87 for DPB1, and their side chains, highlighted. The structure makes clear the critical role of these residues in the P1 pocket of the peptide-binding region, as well as interaction between the alpha and beta chains. While beta chain positions 84-87, located within the peptide-binding region alpha helix, is antigenic and most likely is in contact with bound peptide and the T cell receptor, position 31 on the alpha chain, forms part of the beta-pleated sheet that forms the floor of the peptide-binding groove and is not exposed to the TCR or solvent.
The critical role of position 31 in determining LD with DPB1 alleles is illustrated by the rare DPA1*01:06:02 allele, first characterized in an individual from Kenya (Peterson et al. 2008). The novel allele was initially detected due to the observation of heterozygosity at position 31 (methionine→ glutamine) for a genotyping otherwise consistent with DPA1*01:03 homozygous. Glutamine at position 31 is one of two amino acid positions delineating the DPA1*02:01 and 02:02 alleles, which are in near complete LD with DPB1 alleles bearing the position 85-87 "EAV" motif. Interestingly, the Kenyan individual in whom this allele was identified was heterozygous at DPB1, with the "EAV" motif present in one allele, DPB1*01:01:01, but not the other, DPB1*02:01:02. Here, we find that very little LD exists between the pair of sequence dimorphisms at positions 56 and 85-87, corresponding to the serological specificities described above, with W n 00.242.
While the allele frequency distributions for DPA1 and DPB1 do not differ significantly from neutral expectations, the frequencies for the broad serological types DP1-DP4 are significantly more even than expected under neutrality, suggesting evidence for balancing selection for these specificities. These findings are in keeping with those from other studies in multiple human populations, where, unlike other class II loci, the DPB1 locus did not show evidence for balancing selection at the allele level. In most populations studied to date, DPB1 frequency distributions did not differ significantly from expectations under neutrality (Begovich et al. 2001;Salamon et al. 1999;Solberg et al. 2008), or showed evidence of directional, or purifying selection Pérez-Miranda et al. 2004). However, when the data in Salamon et al. (1999) were examined at the amino acid level, several amino acid sites were found to have significantly balanced polymorphism; notably, the most balanced sites were found to be positions 55-56 and 84-87, consistent with the findings in the present study. Salamon et al. concluded that selection may be operating at the amino acid level in DPB1, and that because this locus is characterized by polymorphism primarily related to gene conversion events, resulting in lower overall polymorphism, this selective effect is masked at the allele level. More recent work by Mack (2011, personal communication) has confirmed that positions 55-56 and 84-87 appear to be particularly balanced for DPB1 in most populations worldwide, regardless of whether the DPB1 allele frequencies are even or directionally skewed in the population.
Strikingly, there is more evidence of recombination between DPB1 positions 56 and 85-87 (W n 00.28), within the gene, than between DPA1 and DPB1 positions 85-87 (W n 0 0.65), i.e., between two adjacent genes. The finding of minimal LD between DPB1 positions 56 and 85-87 is in keeping with numerous studies describing evidence for a history of extensive recombination within the DPB1 locus. While gene conversion and recombination are thought to be an important factor in HLA polymorphism, Buhler and Sanchez-Mazas (2011) have noted that HLA-DPB1 appears to have been particularly impacted by gene conversion relative to other HLA loci, and DPB1 alleles are much more closely related to each other than alleles of other loci; the authors concluded that the patterns of allele and amino acid frequency distributions in world populations show evidence of ancient, rather than recent, balancing selection.
It is interesting to note that the position 56 and 85-87 motifs appear to be characteristics of a specific DP supertype defined by an unusually similar peptide-binding motif (Sidney et al. 2010) and identified among most common DPB1 alleles in most human populations. While the supertype largely shares specificity in the main P6 pocket of the peptide-binding region, the position 85-87 motif impacts P1 specificity. An alternative peptide-binding motif for DPB1*09:01 (Dong et al. 1995) is markedly different from that for the common DP supertype that includes DPB1*02:01 and DPB1*04:01. Interestingly, DPB1*09:01 is the prototype allele of the highly immunogenic TCE group 1 in the original paper by Zino et al. (2004). The TCE group 1 alleles all possess the EAV motif at positions 85-87, and in this study were always observed with DPA1*02:01, consistent with observations for all DPB1 alleles with this motif. It is tempting to speculate that the strong alloreactivity to this TCE group demonstrated both clinically and in mixed lymphocyte culture reactions in vitro (Sizzano et al. 2010;Crocchiolo et al. 2009;Zino et al. 2004;Fleischhauer et al. 2012) could be correlated with the presence of this motif in association with the DPA1 linkage. Likewise, the very strong association of DPB1 with CBD has been pinpointed to a specific role for glutamic acid at amino acid position 69, suggesting that for DPB1 the amino acid residue, rather than allelic identity, may be the important unit of association in human disease.
Taken together, the data in this study suggest that for the DPA1-DPB1 heterodimer, the unit of selection is the combined amino acid epitope contributed by both the DPA1 and DPB1 genes, rather than the allele, and that patterns of LD are driven primarily by dimer stability and conformation of the P1 pocket. This may help explain the differential pattern of allele frequency distribution observed for this locus relative to the other class II loci. These findings further support the notion that allele-level associations in disease and transplantation may not be the most important unit of analysis, and that they should be considered instead in the molecular context.