Background

Class I major histocompatibility complex (MHC) molecules bind short peptides derived from the processing of proteins, and present them on the cell surface for T cell scrutiny. Functional MHC molecules are made of a heavy (α) chain and a β2-microglobulin chain. Peptide binding by class I molecules is accomplished by interaction of the peptide amino acid side chains with discrete pockets within the peptide-binding groove of the MHC molecule formed by the α1 and α2 domains of the heavy chain. Typically, in the case of human leukocyte antigen (HLA) class I, the main binding energy is provided by the interaction of residues in position 2 and the C-terminus of the peptide with the B and F binding pockets of the MHC molecule, respectively [18], although side chains throughout the ligand can have a positive or negative influence on binding capacity. The common chemical specificity of the peptide side chains amongst ligands bound by a specific MHC molecule is termed the binding motif [9].

MHC molecules are extremely polymorphic, and over a thousand allelic variants have already been described at the class I A and B loci. Most of the polymorphism is located in the peptide-binding region, and as a result each variant is believed to bind a unique repertoire of peptide ligands. Despite this polymorphism, HLA class I molecules can be clustered into groups, designated as supertypes, representing sets of molecules that share largely overlapping peptide binding specificity. Each supertype can be described by a supermotif that reflects the broad main anchor motif recognized by molecules within the corresponding supertype. For example, molecules of the A2-supertype share specificity for peptides with aliphatic hydrophobic residues in position 2 and at the C-terminus, while A3-supertype molecules recognize peptides with small or aliphatic residues in position 2 and basic residues at the C-terminus.

The first supertypes were described in the mid-1990's by our group [10]. Using motifs derived from binding data or the sequencing of endogenously bound ligands, along with simple structural analyses, nine different supertypes were defined [11, 12]. Initial peptide binding studies allowed the identification of several peptides with degenerate binding capacity for the A2- [1315], A3- [1618] and B7-supertypes [19, 20]. In subsequent years, additional binding data have been published to confirm the B44- [21], A1- and A24-supertypes [22].

Over time, a number of different and more sophisticated computational approaches have been implemented by others to classify HLA class I alleles into clusters or supertypes [2332]. While in general agreement, additional supertypes, representing either sub-clusters of the original nine or completely novel groupings, have been proposed [23, 32]. Despite differences in various classification schemes, the concept of HLA supertypes has been effectively used to characterize and identify promiscuously recognized T cell epitopes from a variety of different disease targets, including measles-mumps-rubella [33], SARS [34], EBV [35, 36], HIV [3741], Kaposi' Sarcoma Associated Human Herpesvirus [42, 43], HCV [16, 44], HBV [45, 46], HPV [47], influenza [48], P. falciparum [4952], LCMV [53], Lassa virus [54], F. tularensis [55], vaccinia [5658], and also cancer antigens [5968]. Supertypes have also been utilized as a component in several approaches and algorithms designed for predicting peptide candidates with degenerate HLA class I binding capacity [6977]. Finally, supertypes have also been examined as a variable in studies of disease association, rates of susceptibility, and outcome [37, 7886].

The original classification proposed by Sette and Sidney [12] comprised about 100 different class I MHC alleles. However, over the last 10 years, substantially more binding data has been generated, and through the efforts of several seminal MHC-related databases, including SYFPEITHI [87], HLA Ligand [88], FIMM [89] and MHCBN [90], MHC binding motif information is readily accessible. Up-to-date compilations of MHC sequence data are also readily available in the IMGT database [91]. Herein, we analyze the complete list of alleles available through IMGT (release 2.9) using a simple approach largely based on compilation of published motifs, binding data, and analyses of shared repertoires of binding peptides, in conjunction with clustering based on the primary sequence of the B and F peptide binding pockets. As a result, we now provide updated supertype assignments, with new assignments for about 700 different HLA-A and -B alleles. This will permit the use of our original classification scheme based on current data and its comparison to alternative supertype classification methodologies developed since then.

Results

MHC sequences and peptide binding pockets

α1 and 2 chain residues comprising the B and F peptide-binding pockets of 945 HLA A and B class I molecules were extracted and aligned. Based on crystallographic studies [1, 4, 6, 8], residues 7, 9, 24, 34, 45, 63, 66, 67, 70, and 99 were considered as comprising the B pocket, which engages the peptide residue in position 2. The residues comprising the F pocket, which engages the peptide C-terminal residue, were defined as 74, 77, 80, 81, 84, 95, 97, 114, 116, 123, 133, 143, 146, and 147. Further, for the B pocket we defined the subset of residues 9, 45, 63, 66, 67, 70, and 99 as key residues. In the F pocket, the key residues are those in positions 77, 80, 81 and 116.

Pocket chemical specificity

Next, we defined the broad chemical specificity associated with the different B and F pockets. Table 1 summarizes the amino acid residues generally associated with each particular description of binding specificity. The particular set of residues associated with each use of a descriptor may vary with context. While these descriptions largely follow classical textbook definitions, they also consider our own historical use and experience with peptide binding studies. For example, in our analyses we often consider the polar residue Q along with more classically defined aliphatic residues based on observed behavior in MHC peptide binding studies [15, 92, 93]. It is also important to note that some residues are assigned to multiple categories, as their chemical specificity is compatible with different emphases. For example, L would be considered as both aliphatic and hydrophobic. The chemical specificities defining each supertype are listed in Table 2.

Table 1 Physiochemical functionality of peptide side chains.
Table 2 HLA supertype specificity descriptions.

Peptide-binding motif-based supertype assignments

As a starting point, we have largely utilized the nine supertype designations derived previously [11, 12], with a few exceptions. These exceptions entail specificities that appear to be compatible with multiple supertypes. More specifically, we recognize that some alleles have repertoires overlapping both the A01 and A03 supertypes (e.g., A*3001; see Hamdahl et al., IEDB submission 1000945 [9496]), and others with the A01- and A24-supertypes (e.g., A*2902 [22]). Also, because it utilizes a non-canonical main anchor spacing, and as a result appears to have a repertoire that overlaps with other specificities, we have presently separated HLA B*0801 and other alleles sharing sequence and serological antigen similarities, as a separate cluster. While these B*08 alleles utilize a unique mode of peptide binding, it is likely that in most cases their repertoires overlap significantly with other supertypes, especially the B07-supertype (Sidney, Frahm, Brander and Sette, unpublished observations). As a result, we have not defined this a separate supertype.

Next, the available peptide-binding motifs were compiled. In total, 88 different class I motifs were identified. Motif information was derived from our own peptide-binding studies, or from the published scientific literature as compiled in the SYFPEITHI database [87]. The basis for each motif assignment is listed in Tables 3 and 4 for HLA-A and -B alleles, respectively. Corresponding supertype associations were assigned based on the criteria listed in Table 2.

Table 3 HLA-A motif/structure reference panel alleles.
Table 4 HLA-B motif/structure reference panel alleles.

Pocket analysis and supertype assignments

The amino acids comprising the B and F peptide binding pockets were compiled for all alleles for which complete sequence information was available. The positions of specific MHC residues forming the corresponding pockets are listed above. A reference panel of B and F pockets was generated to include all alleles for which the MHC-peptide binding specificity has been defined (see Tables 3 and 4). These B and F pocket structures, along with the associated alleles and corresponding binding specificity are shown in Tables 5 and 6, respectively.

Table 5 B pocket structures of reference panel alleles.
Table 6 F pocket structures of reference panel alleles.

Next, the B and F pocket structures of alleles whose peptide binding specificity was unknown were compared against the sequences in the reference panel. For each case, an attempt was made to find an exact match with the full set of residues in the corresponding pocket. If a match was identified, the allele was assigned the associated specificity shown in Tables 5 and 6. If an exact match could not be found, then a match with a key residue sequence was attempted. If no match was identified, the allele was considered unassigned. For each allele where a match for the full or, secondarily, key residue sequence could be identified for both the B and F pockets with any of the alleles indicated in Tables 3 and 4, a corresponding HLA supertype was assigned.

Of the 945 sequences analyzed, matches at both B and F pockets were found for 764 (80.8%) (Table 7). Notably, for the majority (57%) of alleles not in the reference panel full sequence matches were identified at both the B and F pockets. Figures 1 and 2 indicate the alleles associated with each HLA-A or -B supertype, respectively. Conversely, as an index, Additional file 1 provides the supertype assignment for each of the 945 HLA-A and -B alleles examined, along with their respective B and F pocket structures.

Table 7 Quantification of supertype assignments.
Figure 1
figure 1

Supertype classification of HLA-A alleles. The alleles associated with each HLA-A supertype, multiple supertypes, or that are unclassified, are shown. Under each supertype, alleles are group (by color) on the basis of the stringency of selection: experimentally established motif (i.e., reference panel) (green), exact match(es) in the B and F pockets (white), one exact and one key residue pocket match (yellow), key residue match(es) at B and F pockets (grey). Alleles with no match at one or both pockets are listed with red font.

Figure 2
figure 2

Supertype classification of HLA-B alleles. The alleles associated with each HLA-B supertype, multiple supertypes, or that are unclassified, are shown. Under each supertype, alleles are grouped (by color) on the basis of the stringency of selection as described in the legend to Figure 1.

At the HLA-A locus, alleles were fairly evenly distributed amongst the four supertypes we have defined. Considering the alleles with broad, or dual, specificity (i.e., those assigned as A01–A03 or A01–A24), the minimum was 57 alleles for A24, and the maximum was 95 alleles for A03. 75 alleles were assigned to the A02 supertype, and 80 to the A01 supertype. At the HLA-B locus, the B07-supertype is the largest, with 165 members. 19 alleles were associated with the B08 pattern. As noted above, we hypothesize that alleles in this cluster will likely cross-react with other supertypes (especially B07), so we do not consider this group as a distinct supertype. The B58 supertype was assigned only 22 alleles, making it the smallest supertype.

Novel supertypes could not be identified, but several alleles have specificities spanning two different supertypes

From Tables 5 and 6, it is evident that not all possible combinations of B and F pocket specificities are present within the reference panel. For example, there are no alleles associated with a preference for acidic residues in position 2 and basic residues at the C-terminus. Hence, we were interested to see if combinations not found in the original supertype assignments were revealed in the present analysis. Interestingly, we did not identify any new combinations. It is possible that new specificities will be identified from amongst the set of 181 alleles for which pocket matches could not be obtained. However, we also note that in the majority of these cases, matches could be made if one allowed a single conservative residue change (e.g., E to D). Thus, based on the analyses done to date, it would appear that the set of supertype specificities currently identified will cover virtually all HLA-A and -B class I MHC alleles.

In addition to A*2902 and A*3001, the present analysis has identified 17 alleles expected to have a specificity overlapping two supertypes. Nine alleles matched pocket preference patterns that would be compatible with both the A01- and A03-supertypes, and another 10 with the A01- and A24-supertypes. In the former case, the association was established on the basis of pocket matches with A*3001. Not surprisingly, several, but not all, of the alleles assigned the A01–A03 specificity also share the A30 antigen. The A01–A24 cross-reactivity pattern was established on the basis of pocket matches with A*2902. All of the alleles associated with the A01–A24 pattern also shared the A29 antigen.

Unexpected supertype assignments

Typically, the supertype assignment for a particular allele follows the predominant assignment for other alleles sharing the same serological antigen. With this in mind, we did identify several instances where the current classification differed from what was expected based on serology or previous classification (Table 8). Of these, the more striking ones involve alleles for which the assigned supertype represents a non-conservative change in the binding specificity from what would have been expected. For example, A*0265 and A*0280, which are associated with the A02 serological antigen, and thus might be expected to have an F pocket specificity for hydrophobic/aliphatic residues, were found to have an F pocket matching that of A03 supertype alleles, which have a specificity for positively charged residues. Similarly, B*4012 and B*4440 were found to have B27 supertype B pockets, associated with a preference for positively charged residues, when a B44 supertype specificity for acidic residues might have been expected.

Table 8 Unexpected and/or revised supertype assignments.

As noted above, in the present analysis, and contrary to what was done in the original analysis, we did not make supertype assignments for alleles without an exact or key match by considering conservative substitutions. As a result of this more conservative approach, we now no longer assign supertypes to some alleles previously given a supertype designation (Table 9). In the majority of these cases, however, the original designation would still seem to be a reasonable assumption.

Table 9 Alleles reclassified as "unassigned".

Discussion

In the present study we have attempted to classify almost 1000 HLA-A and -B class I alleles into supertypes. This is nearly a 10-fold increase in the number of alleles compared to our original classification done about a decade ago [12]. Besides providing supertype assignments for considerably more alleles, the present report has attempted to make more transparent how the original "phenomenological" classifications were done. About 80% of the 945 alleles examined were classified into one of the nine supertypes identified previously. Analysis of B and F pocket specificity patterns did not suggest the existence of any novel supertypes.

HLA supertypes do not necessarily demarcate groups of alleles with completely non-overlapping repertoires. A binding repertoire overlapping multiple supertypes has been demonstrated previously, for example, in the cases of A*2902 [22] and A*3001 (see Hamdahl et al., IEDB submission 1000945 [9496]). In the present study we have identified 17 other alleles that would appear to have specificities that bridge either the A01 and A03 supertypes, or the A01 and A24 supertypes. At the same time, individual peptides can be readily identified that bear a particular supermotif, but that do not bind individual HLA allele members of the supertype, or that bind alleles of other supertypes, even supertypes associated with a different locus. Typically, in the first case, these phenomena result from differences in motif compatibility, perhaps at secondary positions. The second case likely reflects overlap(s) between the supertypes in terms of specificity, although in rare cases binding can be accomplished when no main anchor motif compatibility is apparent.

These observations are exemplified by a large scale analysis of the capacity of a non-redundant set of 252 known EBV and HIV derived epitopes to bind a panel of 30 different HLA class I A and B molecules (Sidney, Frahm, Brander and Sette, unpublished observations). It was found that about 21% of the peptides bearing a specific supermotif bound a given allele in the corresponding supertype with an affinity of 100 nM, or better. By contrast, only in 1% of the cases considered did an allele bind a peptide that did not have the corresponding supermotif. At the same time, it was noted that in the set of peptides utilized 62% (155/252) have motifs associated with 2 or more supertypes. The pattern of binding also followed this general promiscuity. It is also significant to note that when the same library of peptides was examined for recognition in HIV/EBV patients, it was found that ~95% of the epitopes were recognized in individuals not expressing the allele the epitope was originally reported to be restricted by, and the promiscuity more often than not involved an allele outside of the supertype associated with originally described restricting allele [97]. Thus, it is apparent that the lines of demarcation between supertypes can be fuzzy from the perspective of both the allelic specificity and the peptide motif.

Restriction outside or across supertypes can also originate from overlaps in supermotifs (e.g., A02 and B62), or for alleles such as B*0801 which do not utilize the typical P2/Cterminus anchor spacing. B*0801 utilizes P3 and P5, not P2, and as such may be compatible with several supertypes and alleles. Thus, an A02- or A24-supertype epitope cross-reacting with B*0801 is not an example of a motif "failure", but merely reflects the fact that the specific peptide has both motifs.

It is important to emphasize that supertypes are based on MHC binding, and that MHC binding alone is not sufficient criteria for T cell recognition. Indeed, hundreds of examples of peptides that bind with remarkably high affinity, but that are not recognized by T cells, have been reported in the literature. We note that even in the best affinity ranges (i.e., IC50 <10 nM), rarely more than 10% of the peptides can be expected to be recognized [98]. Similarly, binding affinity is not necessarily correlative of frequency of recognition [99]. It is true that the trend is towards the most frequently recognized peptides being also the highest affinity binders [38, 50, 100], but that is not always the case, and there are clearly cases where the dominant epitope has an IC50 of ~100 nM, while several other non-recognized peptides have affinities in the <10 nM range.

It must also be emphasized that membership of an epitope in a supertype is not sufficient to guarantee its recognition by T cells in the context of different MHC alleles. Peptide binding to MHC is an absolute requirement for an epitope to be recognized by T cells. At the same time, many other factors, including protein expression and processing, as well as T cell repertoire and the specific MHC context, come into play in determining whether a peptide will be an epitope or not, or whether an epitope will be promiscuously recognized within a specific supertype. For example, Goulder and co-workers, studying B7-supertype epitopes, found that differential selection pressure exerted on HIV by CTL targeting identical epitopes, but restricted by distinct HLA alleles from the same supertype, can result in significant functional differences [101]. Macdonald et al., looking at two B44 subtypes described as members of the HLA B44-supertype, reported that a naturally selected dimorphism between the two molecules alters class I structure, peptide repertoire, and T cell recognition [102].

The intent of the current study was to derive an updated classification of HLA class I MHC alleles on the basis of primary anchor specificity. For the vast majority of HLA class I molecules whose binding specificity have been described by crystal structure, pool sequencing or peptide binding studies, the main anchor interactions of the peptide almost invariably involve the MHC B and F pockets, while other pockets likely dictate secondary interactions. This pattern also appears to be true for most macaque and chimpanzee class I alleles studied to date.

There are exceptions, however, and indeed we have not assigned B*08 alleles to a specific supertype in recognition of the fact that these alleles appear to utilize pockets other than the B pocket as a primary anchor contact. For HLA class I molecules in general, the B and F pockets are the most likely main anchor contacts, while other pockets likely dictate secondary interactions. High levels of crossreactivity have been experimentally demonstrated in the case of 6 supertypes for alleles that vary at secondary pockets [15, 17, 2022].

By contrast, in the murine system it is well recognized that other pockets are often the important primary peptide contacts [103109]. Thus, to utilize the classification approach described here in the context of other species, additional pockets may need to be considered. It is also likely that the further parsing or sub-classifying of supertypes on the basis of secondary interactions can be accomplished.

In the case of HLA-B alleles, F pocket specificity is difficult to correlate with a specific sequence, as a diverse pattern of residues appears to be associated with similar binding specificity. Independent of the residues in the F pocket, most HLA-B alleles seem to bind hydrophobic residues. Thus, assignment of B alleles was primarily driven by the specificity exhibited by the B pocket. On the other hand, it is also possible that greater resolution in the F pocket could be achieved as more data become available to discriminate different preference patterns. For example, in the B7 supertype it is apparent that some alleles, like B*3501, prefer large hydrophobic residues, such as Y. Conversely, other B7 supertype alleles, such as B*5401, seem to prefer small hydrophobic residues, such as A, at the C-terminus. While we have noted these subtle differences in preference [20], in practice we have not found that they significantly impact cross-reactivity between the alleles. This perhaps suggests that the C-terminal anchor in some contexts is less important, and that shared secondary preferences can have a stronger influence on degenerate binding capacity than in other cases. At the same time, it may be necessary to consider additional key residues in the analysis of the F pocket. This is exemplified in the cases of A*2603 and A*0301 which have the same key F pocket residues, but which are associated with much different specificity.

The vast majority of HLA-A and -B alleles fall into one of the 9 supertypes we have described. There are likely reasons for this [110], which include evolutionary relationships, but also constraints and limitations inherent in the epitope processing infrastructure. For example, no allele has been identified to date that binds peptides with D, E, Q or P at the C-terminus, which is in congruence with the preferences of both proteasomal cleavage and TAP transport [111], and an observation that has been applied in the rational design of an in vitro test reagent tool (PeptGen) offered as a tool by the Los Alamos HIV Sequence Database [112].

Supertype classification should not be taken to necessarily imply an evolutionary relationship. In some cases this is largely true, as for example in the case of the A2-supertype, where most alleles are associated with the A2 serological antigen. In other cases the relationship is more complicated, such as the gene conversion relationship between the A2, A3 and A68 antigens. This latter example is somewhat of the exception that proves the rule. Supertype associations are based on shared binding specificity, which may result from both common ancestry and convergent evolution [110]. Thus, while alleles within a supertype may have a close evolutionary relationship, that is not a given. Also, alleles (supertypes) sharing specificity at one anchor position may be associated with very disparate specificities at the other.

Other groups have also utilized various methodologies to define supertypes. In general, our classification is in agreement with those derived by other approaches, as compiled by Hertz [32], Lund [23] and Tong [31]. This is not surprising given the good agreement observed between our initial dataset and other classifications, and that the methodology utilized here is not different from the one utilized to derive the original assignments. If there are variations, they usually represent the splitting of a supertype, or reassignment of individual alleles. As in any classification problem of this kind, there is no absolute truth in supertype assignments. The practical application of supertype classification schemes to identify degenerately binding peptides will ultimately show what classification scheme has the most practical value.

Conclusion

The present study represents an update to the HLA class I supertype classification originally described almost a decade ago. Using MHC peptide binding motif data and MHC sequence information that has since become available, supertype associations have now been provided for over 750 HLA-A and -B alleles. In addition, the approach utilized has been made more transparent, allowing others to utilize the classification approach going forward.

Methods

MHC sequences

The sequences of HLA A and B class I alleles were obtained from the IMGT/HLA Database [91, 113], release 2.9. Alleles with incomplete sequences were removed from further analysis. The residues forming the B and F peptide binding pockets were aligned as described in the Results section.

Peptide binding motif reference panel

The main anchor peptide binding motifs recognized by HLA-A and -B molecules were compiled from our own data, or as reported at the SYFPEITHI database [87]. The HLA supertype associated with each motif was assigned as defined previously [11, 12], and/or on the basis of the broad chemical specificity of each supertype indicated in Tables 1 and 2.

Pocket analysis

The residues forming the B and F pockets of each allele in the motif reference panel were compiled as a lookup table in Microsoft Excel. Pockets were defined more stringently by considering all residues forming the pocket, or less stringently by considering only a subset of residues, denominated as key residues, hypothesized to be most directly involved in peptide binding. To assign a B and F pocket specificity for each of the remaining alleles, the reference panel was scanned to identify exact pocket sequence matches. If an exact match could be identified, the allele was assigned the corresponding B or F pocket specificity. Supertype assignments were then made by matching the B and F pocket specificity pattern with the supertype descriptions indicated in Table 2. If an exact match for either the full pocket sequence or key residue sequence could not be made at both the B and F pockets, the allele was considered unassigned.