Background

The established associations between the highly polymorphic MHC loci and several human diseases have elucidated the possible genetic basis of their predisposition. [1, 2] From a classical approach of mapping an MHC allele with a particular disease, the focus has shifted to determine the specific peptides presented to MHC molecules with clearly defined sequences. Different MHC alleles recognize different peptides and the binding probabilities of natural and non-natural peptide ligands to MHC molecules are non-static. [3] The current challenge is to screen the sequences for candidate MHC ligands or tissue specific disease-inducing peptides as relevant T-cell epitopes. Identification of T-cell epitopes associated with a particular disease can lead to the development of potential peptide vaccines. [4] Such epitopes also find application in tetramer staining as powerful immuno-markers for estimating antigen specific T cells during pathogenesis. [5] Establishing MHC binding differences to mHags (minor histocompatibility antigen peptides) will guide the interpretation of HA-1 related GvHD (Graft vs Host Disease) data in the context of different MHC alleles. [6, 7] However, additional parameters describing the mechanism of peptide processing, peptide transport, loading of peptide to MHC molecules and presentation of MHC-peptide complexes for inspection by T cells are crucial in epitope selection. [8, 9]

The successful sampling of short peptides from a pool of viral or bacterial protein sequences using MHC-peptide binding prediction programs depends on the accuracy of their algorithms. A number of computational methods have been developed for the prediction of MHC-peptide binding [1026]. Using data from allele specific binding experiments – sequence binding motif analysis [10]; weight matrices [1113], ANN [1416], HMM [17] and iterative stepwise discriminant meta-algorithm [18] have been applied for predictions. These algorithms have been used to predict peptide binding to very few MHC molecules because binding data is not available for many alleles. Protein threading [1922] and side-chain packing [2326] techniques have been applied in molecular mechanics based MHC-peptide binding predictions. The molecular mechanics based binding prediction approach can be extrapolated to a wide range of MHC molecules defined by sequence nomenclature. The prediction of peptide binding to MHC molecules is described as a two-fold problem, the first being protein folding [27] and the second molecular interactions. [2830] The problem of packing sidechains using a near native backbone has been solved. [27] Generating peptide backbones sufficiently close to the native backbone to allow packing algorithms is still a challenge. [31] Hence, methods for predicting backbone conformation are not as developed as that for sidechains.

Data for a number of A*0201, A*6801, B*0801, B*2705, B*3501, B*5301, H-2KB, H-2DB, H-2DD, H-2LD, DR1, DR2, DR3, DR4, I-AD MHC-peptide crystal complexes are available in the PDB. A comprehensive report mapping MHC sequences with X-ray crystal structures and relative binding strength is available. [32] Recently, Cano and Fan conceptualized peptide binding to MHC by algebraic and geometric frameworks using structural data. [33] All MHC alleles have more than 70% sequence identity with known MHC structures. [25] This allows structure prediction for the known 1,500 HLA sequences [34] using known templates. [25] Currently, accurate prediction of peptide structures in the MHC groove is not reliable due to the limited availability of peptide backbone templates and the shortcomings in the existing peptide backbone prediction methods. Using independent procedures, Schuler et al., [21, 22] and Rognan et al. [23] have demonstrated a method for peptide backbone selection and showed a reasonable improvement in the MHC-peptide binding prediction. [2123] An accurate prediction of the peptide structure in the groove can be achieved through the appropriate selection of backbone templates for threading or side chain packing. The critical nature of the backbone conformation that affects MHC-peptide binding will be interesting to investigate. The nature of inter-atomic interactions at the MHC-peptide interface has been studied for individual complexes by protein crystallographers. However, there is no comprehensive report available describing the common types of interactions in a set of MHC-peptide complexes characterized by MHC allele variation and peptide sequence diversity. The objective of this study is to find which types of inter-atomic interactions contribute more in defining the binding between peptides and MHC molecules.

Results

The available data in the PDB are redundant and hence we created a non-redundant set from those entries with the best resolution for the related structural complexes having identical sequence information. The non-redundant dataset consists of twenty-eight class I MHC-Peptide complexes and ten class II MHC-peptide complexes. All the complexes chosen for the study are characterized by variation in sequences constituting the MHC-peptide complexes. The binding of MHC and peptide can be described using inter-atomic interactions based on backbone and sidechain atom preference at their interface.

We calculated the percentage prominence for each of the four types of interactions (BB, SS, BS and SB) at the interface of these complexes (Figures 1 and 2). The backbone or sidechain atom preference at the interface induced by MHC-peptide sequence variation is estimated by calculating the mean percentage for each type in the dataset (Figures 3 and 4). The preferences for the interaction types are found to be similar between complexes but not identical (Figures 1 and 2). Therefore, we calculated the standard deviation about the mean percentage preference for each of the interaction types in both the data sets (Figures 5 and 6).

Figure 1
figure 1

Percentage distribution of the four interaction types at the interface of class I MHC-peptide complexes. Inter-atomic distances are expressed in Å units.

Figure 2
figure 2

Percentage distribution of the four interaction types at the interface of class I MHC-peptide complexes. Inter-atomic distances are expressed in Å units.

Figure 3
figure 3

Mean percentage distribution of the four interaction types in class I MHC-peptide complexes. Inter-atomic distances are expressed in Å units.

Figure 4
figure 4

Mean percentage distribution of the four interaction types in class II MHC-peptide complexes. Inter-atomic distances are expressed in Å units.

Figure 5
figure 5

Standard deviation about the mean percentage distribution of the four interaction types in class I MHC-peptide complexes. Inter-atomic distances are expressed in Å units.

Figure 6
figure 6

Standard deviation about the mean percentage distribution of the four interaction types in class II MHC-peptide complexes. Inter-atomic distances are expressed in Å units.

SS and SB interactions are prevalent compared to the other two types (Figures 3 and 4). This observation is true for both class I and class II MHC-peptide complexes. SB interactions are prevalent than SS interactions at 3Å cut-off distance in these molecules. From 3.5–6Å SS interactions dominate over SB interactions in class I complexes. At inter-atomic distances greater than 6Å, SS and SB interactions are just as prevalent. However, SB interactions are relatively dominant over SS in class II complexes.

SS and SB interactions are influenced by MHC sequence polymorphism and peptide sequence diversity. The mean percentage distribution is maximum at an inter-atomic distance of 3Å for SS and SB interactions (Figures 3 and 4). However, the distribution of standard deviation remains at a maximum for inter-atomic distance ranges of 2–3Å in both the classes of MHC-peptide complexes (Figures 5 and 6). The standard deviation for SB type interactions is the highest in these complexes and this explains the sequence induced variation in peptide backbone/MHC sidechain atom preference during MHC-peptide binding. The sequence induced deviation for inter-atomic interactions is also observed for SS in class I complexes. It is interesting to note that the presence of BB and BS interactions in these complexes is limited compared to the other two types and the standard deviation is also minimal (Figures 3 and 4). Our results explain the consistent prevalence of SS and SB interactions at the MHC-peptide interface.

Discussion

The differential binding of peptides to diverse MHC molecules during cell-mediated immune response has been fairly established using MHC-peptide structural data obtained by X-ray crystallography. [32, 35, 38] The available biochemical binding data obtained by kinetic studies [36, 37] complements the structural explanation for MHC-peptide binding. [32, 35, 38] The structural similarity between known MHC alleles allows for side-chain prediction procedures to be carried out for other MHC molecules using available structural templates. [25, 39] However, model building of a user defined peptide sequence in the groove using sidechain packing techniques requires reliable backbone templates. The prediction of allele specific peptide structures depends on the selection of peptide backbone from a template library. [21] The root mean square deviation for peptide backbone atoms (N, Cα, C and O) lies within 2.1Å among structure groups based on allele specificity and peptide length. [21] Thus, it is possible to select the most appropriate peptide backbone template for predicting the structure of a user defined peptide sequence in the groove. In this approach, the peptide structure in the groove is constructed by threading and its compatibility to bind is evaluated by statistical pair-wise potentials. [21, 22] These pair-wise potential tables emphasize either hydrophobic [40, 41] or hydrophilic interactions [42] at the interface. The efficient prediction of peptide side-chain conformations in the groove has been shown mainly due to van der Waals contribution. [21] An independent study used a simple and fast free energy function (Fresno) to predict the binding free energies of peptides to class I MHC proteins. [23] This was based on the explicit treatment of ligand desolvation and unfavorable MHC protein-peptide contacts. A similar binding/non-binding grouping scheme was based on vdWC and SEHPR. [25] Despite sufficient knowledge on the chemical nature of molecular interactions very little is known about the common interaction types for MHC-peptide complexes. Here, we present the distribution of four types of inter-atomic interactions between MHC and peptide. SS and SB interactions are commonly found at the interface of these complexes. This implies that the backbone atoms in the MHC molecules play a secondary role in the binding of the peptide; it is the interaction between the sidechain atoms in the MHC molecules with both side-chain and backbone atoms in the peptide what determines the binding. Success in peptides designed to bind in the MHC groove has been achieved by carefully dissecting side chain interactions and placing appropriate flexible residues at key positions in the peptide. Hence, SS interactions are crucial for proper anchoring of short peptides within the groove. The SB interactions might facilitate an induced fit of the peptide during entry into the groove. The backbone conformation adopted by the peptide in the groove is important for maintaining the predominantly common SB interactions. Specific interactions by peptide sidechain atoms inside the groove may force its backbone to adopt a suitable conformation for maximal interactions with the receptor atoms.

Conclusions

The current challenge in MHC-peptide binding prediction is twofold: (1) accurate prediction of peptide backbone conformation for subsequent sidechain packing (2) accurate estimation of function by such predictions for quantitative MHC-peptide binding studies. Much of our earlier understanding on protein-ligand interactions is based on the steric factors, electrostatic contributions, hydrophobicity, hydrogen-bond donor or acceptor capability. Our results show the prevalence of backbone or sidechain atom preference at the MHC-peptide interface characterized by varying sequence composition. The prevalence of SB interactions in these complexes suggests the importance of peptide backbone conformation during MHC-peptide binding. The currently available protein structure prediction algorithms are well developed for protein sidechain packing upon fixed backbone templates. This study shows the prevalence of backbone atoms in MHC-peptide binding and hence highlights the need for accurate peptide backbone prediction in quantitative MHC-peptide binding estimation using molecular mechanics calculations. Development of an efficient energy function for the accurate prediction of both backbone and side-chain conformations followed by an effective MHC-peptide interaction function will help to quantify the differences in peptide binding caused by MHC polymorphism and peptide diversity. It should be noted that the conclusions reached in this article are based on the available crystal data. Additional data will be required to confirm the proposed hypothesis. If the efficiency of MHC-peptide binding prediction is carefully assessed for routine application then identification of T-cell epitopes from sequence information will become easier. Apart from peptide MHC specificity many other important parameters that describe cellular mechanisms such as enzyme-mediated antigen processing, peptide transport, loading of peptides to MHC molecules and the phenomenon of TCR repertoires have to be identified and incorporated into the prediction framework.

Methods and materials

MHC-peptide X-ray crystal data

X-ray crystal data for MHC-peptide complexes are retrieved from Protein Databank (PDB) (http://www.rcsb.org/pdb/). If more than one entry described an identical combination of MHC and peptide sequence information we selected the entry with the best atomic resolution (Å). We identified 28 non-redundant class I MHC-peptide complexes (Table 1) and 10 non-redundant class II MHC-peptide complexes (Table 2). The two sets of crystal complexes are examined for the different types of inter-atomic interactions at the interface.

Table 1 Class I MHC-peptide complexes in the protein databank
Table 2 Class II MHC-Peptide complexes in the protein databank

Data analysis

Inter-atomic interactions at the MHC-peptide Interface

The interactions between MHC and peptide are studied by measuring the distance between each atom in the MHC and each atom in the peptide. An atom in a MHC residue or a peptide residue is considered to be involved in MHC-peptide binding if the distance between any atom of the peptide and any atom of the MHC is less than or equal to x (Å). The value of x is varied from 0.0 to 10.0 (Å) at increments of 0.1 (Å). The total number of inter-atomic interactions at every value of x is grouped into four different types based on backbone and sidechain atom preference between MHC and peptide. Four types of inter-atomic interactions namely, BB (backbone MHC – backbone peptide), SS (sidechain MHC – sidechain peptide), BS (backbone MHC – sidechain peptide) and SB (sidechain MHC – backbone peptide) characterize the MHC-peptide interface based on backbone and sidechain atom preference.

Percentage distribution for the interaction types

Percentage distribution for the interaction types is defined as the percentage of each interaction type over all interactions for a given inter-atomic distance.