1 Introduction

Most sophisticated biological functions occur through the interaction of two or more components. Consequently, protein–protein interactions play critical roles in such numerous processes, including the immune response, signal transduction and enzyme regulation [1]. Along with the rapid progresses in structural biology in recent years, researchers have been able to expand their interest from the study of single protein structures to that of larger complexes with multiple components. Indeed, the analysis of molecular interactions at the atomic level, such as antigen–antibody interactions in immunity in vivo, is essential for interpreting the structural basis of biological functions. Understanding how an antibody recognizes an antigen and what determines their specificity are crucial in successful vaccine design and continued improvement [2, 3].

The antigenic determinant or ‘epitope’ is presented to the host immune system and subsequently elicits an appropriate antibody reaction. In general, each epitope comprises only a few residues (6–20 residues), which is sufficient to produce an immune response [4, 5]. Structural biological methods and in silico docking prediction methods [6] can model antigen–antibody complexes, and analyses of solvent accessibility surface variation, non-bonding contacts, and the interaction energy can highlight the likely arrangement and potential involvement of specific residues at the interaction interface [7], a process known as epitope mapping. Unfortunately, this relatively static conformation of an antigen–antibody complex is not always sufficient to ascertain the particular residues that will be critical in the binding dynamics and associated energy stability. Thus, epitope residues are often further verified through the use of routine experimental methods, such as site-directed mutagenesis [8] or peptide scanning [9], concomitant with the use of enzyme-linked immunosorbent assay (ELISA), western blotting, Surface Plasmon Resonance (SPR) binding assay and/or analytical ultracentrifugation. However, these strategies are time-consuming and often limited by the creation of alanine point mutations on antigens instead of saturated mutations on the antigen and the antibody. In addition, site-directed mutagenesis of an epitope amino acid might lead to errors in folding of the protein, and thus a disruption of its function or binding capabilities [10], or even a failure in its expression [11]. As such, the peptide scanning approach is only suitable for mapping linear, not conformational, epitopes [12].

Quantitative model for binding energy based on empirical force field had been proposed to determine the real hot spot on protein–protein interface [13]. A knowledge-based hot spot prediction server was also developed using a machine learning approach based on experimental data [14, 15]. These methods provide general protocols to identify hot spot accounting for binding affinity. Here, we describe an in silico saturation mutagenesis method to predict strategic residues on epitope that are energetically critical in mediating the binding of antibody. To this end, we employed a ‘Calculate Mutation Energy (Binding)’ program, using Discovery Studio 4.1 (Accelrys, San Diego, CA) software, to evaluate a series of in silico site-saturated mutations. For these experiments, we employed data from three previously characterized crystal structures with associated alanine substitution mutagenesis results: two crystal structures pertaining to the hepatitis E virus (HEV) and one pertaining to the human immunodeficiency virus (HIV) in combination with their respective antibody binding data [16, 17]. HEV is a small, non-enveloped RNA virus that harbors a protrusion on the basal aspect of the virus, known as the E2s domain. This domain is responsible for host interactions and immunogenicity and is, consequently, the focus of much investigation. The p24 antigen of HIV is a small capsid protein found on the capsule of HIV that is frequently used to detect infection.

Calculations based on saturation mutagenesis predict the key residues required for antigen–antibody interactions, as determined by a significant increase in the free energy in response to a given mutation. These calculations also allow for the identification of potential key epitope sites for further experiments, provide guidance as to the type of amino acid substitution that will be applicable in subsequent experiments, and facilitate antigen design for vaccinations as well as strategies for antibody affinity maturation assays [18, 19]. Fluctuations in the environmental pH may also have a significant effect on binding affinity [20], and, thus, the effects of pH and protein ionization are also considered through energy-term parameterization [21].

Through these in silico interaction and binding energy analyses, we proposed a different cutoff value for predicting key epitope residues in place of that suggested in the guidelines of the software. Furthermore, our findings with this approach show good congruency with the results from our previous in vitro alanine scanning mutagenesis experiments, demonstrating the potential utility of this approach for future rational drug or vaccine design.

2 Materials and Methods

2.1 Preparation of Immune Complex Structures

Three crystal structures of HEV E2s-genotype 1, HEV E2s-genotype 4 and HIV p24 in complex with their respective antibody Fab fragments were determined previously [16, 17, 22]; the PDB IDs for these structures are 3RKD (HEV genotype I E2s with 8C11 Fab, E2s-I:8C11), 4PLK (HEV genotype IV E2s with 8G12 Fab, E2s-IV:8G12) and 3VRL (HIV p24 dimer with A10F9 Fab, p24:A10F9). Using the default CHARMm force field [21], these initial structures were subjected to 2000-step smart minimizer minimization to ensure each of them reached energy minima before subsequent calculations were performed.

2.2 Delineation of the Interface Regions

The interface regions of the antigen–antibody complexes (E2s-I:8C11, E2s-IV:8G12 and p24:A10F9) were defined up to a 6-Å contact distance [23], in combination with a decrease in the solvent accessible surface (SAS) area. Other criteria, including hydrophobic clusters, hydrogen bonding contacts, van der Waals forces, and electrostatic interactions, were also considered for interface definition, which were conducted using the Analyze Protein Interface tool of Accelrys® Discovery Studio 4.1 software. Overall, several residues at the interface were defined as potential key epitope sites for further analysis.

2.3 In Silico Mutagenesis and Energy Evaluation

In silico mutagenesis was performed using the Calculate Mutation Energy (Binding) protein design tool embedded in Accelrys® Discovery Studio 4.1. This protocol assesses changes in the binding affinity of protein complexes in response to single-point mutations—in this case, alanine point mutations—among a selected set of amino acid residues. The mutation energy is defined as the free energy shift that occurs upon mutation, and the value is used to hypothesize the effect of a virtual mutation. CHARMm force fields (default settings) were employed to determine the free energy of the complexed and unbound states. A schematic representation delineating the process was showed in Fig. 1.

Fig. 1
figure 1

Schematic representation of the entire process

The pH-dependent mode was set as true to report the binding energy differences. The electrostatic terms were calculated by integration over the proton binding isotherms, derived from fractional protonations of acidic and basic residues which in turn were calculated using the same method as in the Protein Ionization component [24]. The mutation energy was calculated as a function of pH between the mutant and wild type:

$$\Delta \Delta G_{\text{mut}} = \, \Delta G_{\text{bind}} \left( {\text{mutant}} \right) \, - \, \Delta G_{\text{bind}} \left( {\text{wild type}} \right),$$

where ΔG bind = ΔG AB − ΔG A-B separated. Positive values of ΔΔG mut represent a destabilizing effect of the mutation, whereas negative values represent a stabilizing effect.

The total free energy difference, ΔG tot(pH), between the bound or unbound state in pH-dependent mode is calculated as the following weighted sum of the VDW, electrostatic, entropy and non-polar terms (which includes the effects of both pH and ionic strengths):

$$\Delta G_{\text{tot}} \left( {\text{pH}} \right) \, = aE_{\text{vdW}} + \, \Delta G_{\text{elec}} \left( {{\text{pH}},{\text{I}}} \right) \, - \, c{\text{TS}}_{\text{sc}} + \, \Delta G_{\text{np}} .$$

In general, proteins were suspended in 1 × phosphate-buffered saline, pH 7.45, with an ionic strength of 0.15 M to simulate a physiological environment.

3 Results

3.1 Preliminary Epitope Residue Identification at the Antigen–Antibody Interface

Antigen–antibody interface residues for the three immune complexes were included in our mutagenesis simulations. In previous studies, we showed that the epitope of 8C11 Fab and 8G12 Fab locates to different positions on the HEV E2s dimer surface: 8G12’s epitope is located at the E2s dimerization region, whereas 8C11 recognizes and binds with the E2s dimer on the opposite side [16, 17, 25]. The binding site of A10F9 Fab is located on one monomer of the shoulder-to-shoulder HIV p24 dimer [22]. Using ΔSAS comparison, hydrogen bonding contact analysis, and energetic calculations, we first omitted any of the residues that showed limited to no involvement in the antigen–antibody interaction.

In the structure of E2s-I:8C11, the antibody recognition sites were identified using eight discontinuous regions on E2s: Thr476–Ala477, Glu479, Thr484–Tyr485, Asp496–Thr499, Val510–Leu514, Lys534, Asn573–Arg578, and Pro592. The relative SAS area (%) for each residue in the complex and on the individual antigen (E2s) was calculated and the ΔSAS values are listed in Table 1. These residues were exclusively distributed on the SAS area, and three major conformational patches (Asp496–Thr499, Val510–Leu514 and Asn573–Arg578) were involved in the center region of the antigen–antibody interface (Fig. 2). Among these major domains, Arg512 showed the strongest interaction with the complementarity determining region (CDR) of the antibody in terms of the abundance of hydrogen bonding contacts formed, which included a hydrogen acceptor from the side chain of AsnL32, and the more electronegative acceptor atoms from the main chains of PheL91 and GlyH107 (Fig. 2; the superscripted “L” denotes the light chain of the mAb and “H” the heavy chain).

Table 1 Changes in solvent accessible surface areas in E2s-I:8C11 complex
Fig. 2
figure 2

Interaction between 8C11 Fab and HEV E2s of genotype I. a Cartoon representation of an overall view of the E2s-I:8C11 Fab complex. The E2s of genotype I is in light pink, the light chain in yellow, and the heavy chain in light blue. Residues involved in the protein–protein interaction are shown in stick mode and colored by element. The carbon element of the key residue, Arg512, is shown in magenta whereas the other residues are shown in green. b Interface residues of E2s of genotype I shown in stick mode overlapping with the transparent surface. Three major conformational patches are shown in cyan, and the other region in yellow. c A close-up view of the interactions. Arg512 forms three hydrogen bonding contacts with GlyH107, AsnL32 and PheL91. Hydrogen bonding contacts are shown as a red dashed line. Key residues on the complementarity determining region (CDR) are shown in line mode. GlyH107 is in blue, and AsnL32 and PheL91 in aqua. d Details of the hydrogen bonding contacts

In the E2s-IV:8G12 complex, an analysis of the crystal structure suggested that the major interactions between the antigen and antibody may be predominantly mediated by fifteen predicted hydrogen bonding contacts. Glu549 and Pro592 were determined as the most energetically critical residues, each observed to be involved in making four hydrogen bonding contacts with the surrounding atoms of the antibody; this also included the hydrogen acceptor atoms from GluL93, ThrL94 and TyrH105 toward Glu549 and GlyH57; and GlnH58 toward Pro592. Furthermore, pi interactions from Lys554 to TrpL92 were observed. Most of the residues with significant interaction energy made close, strong hydrogen bonding contacts between 8G12 and the E2s (Fig. 3).

Fig. 3
figure 3

Interaction between 8G12 Fab and HEV E2s of genotype IV. a Cartoon representation of an overall view of the E2s-IV:8G12 complex. The E2s-IV is colored in light pink, the light chain in pale yellow, and heavy chain in light blue. Residues involved in protein–protein interactions are shown in stick mode. b Interface residues are shown as sticks overlapping with the background of the E2s-IV surface. The major conformation domain is shown in cyan. c A close-up view of the interactions. The key residues—Glu549, Lys554, Gly589, Pro592—are shown with magenta-colored carbons; the others with green-colored carbons. Hydrogen bonding contacts are indicated with red dashed lines. Residues from the antibody are rendered in line mode, with the heavy chain in blue and the light chain in aqua. d Detail interactions of the E2s-IV:8G12 complex

For the P24:A10F9 complex, we previously showed that the A10F9 Fab recognizes a conformational epitope via extensive pi–pi interactions (Pro207 located within an aromatic cage formed by TrpL92, TyrL32 and TyrH105), and a pi interaction between Leu205 and TyrH59 [22]. Residues involved in these aforementioned interactions are listed in Table 2, among which Asp197 forms two hydrogen bonds with SerH52 and SerH56; Arg203 forms three hydrogen bonds with TyrH105, ValH106 and GluH100; and Leu205 forms one hydrogen bond with ThrL94. Furthermore, hydrogen bonds were detected between Pro207 with PheL91 or TrpL92. Besides, Ile201 also maintain indispensable van der Waals interactions with residues on the antibody (Fig. 4; Table 2).

Table 2 Preliminary analysis of interactions in p24:A10F9 complex
Fig. 4
figure 4

Interaction between A10F9 Fab and HIV p24. a An overall view of dimeric p24:A10F9 in cartoon mode. Structure of p24 dimer is shown in gray, light chain of A10F9 Fab in pale cyan, and heavy chain in light blue. The interface residues are shown in stick mode and colored by element. b A close-up view of the interactions. The interface residues are shown in stick mode and colored by element. The key residues—Asp197, Ile201, Arg203, Leu205 and Pro207—are shown with magenta-colored carbons; the others with green-colored carbons. Hydrogen bonding contacts are marked with red dashed lines. Residues from the A10F9 Fab are rendered in line mode, with the heavy chain in blue and the light chain in aqua

3.2 In Silico Saturation Mutagenesis

In silico, single-point, alanine scanning and saturation mutagenesis was performed on each of the potential key epitope residues defined by the aforementioned interaction analyses. The mutation models were subjected to energy minimization using CHARMm force fields and the difference between the mutation and wild type models was quantified by calculating the energy change. According to the guideline of Discovery studio, if the energy increase upon a mutation reaches a pre-defined cutoff value (0.5 kcal/mol), it could be inferred that the mutation will abrogate the binding or reduce the binding affinity, and the corresponding amino acid should be defined as key site that participates in epitope recognition.

In the E2s-I:8C11 complex, only one mutation (Arg512Ala) was predicted to have a substantially higher free energy (4.84 kcal/mol; Fig. 5), which is consistent with its strategic position at the interface [16]. Replacement of Arg512 with Ala abrogated the formation of three hydrogen bonds with the 8C11 Fab (Fig. 6), and the interaction energy increased dramatically, thereby completely disrupting the binding, as shown by western blotting [16].

Fig. 5
figure 5

Mutation energy landscape of in silico saturation mutation on the E2s-I:8C11 interface. a Mutation energy landscape of in silico single-point, saturation mutations performed on interface residues in the E2s-I:8C11 complex. b Critical residues with alanine mutation energy greater than 1 kcal/mol are shown in the red column. The 1 kcal/mol cutoff line is indicated in red, and alanine mutations are highlighted in red. Congruence between the mutation energy assay and the experimental mutations in our previous study is denoted as follows: “+”, consistent; “N/A” indicates no available experimental data. The cutoff value was reset to 1 kcal/mol, and residue exhibiting greater than 1 kcal/mol of mutation energy were deduced to be strategic for the antigen–antibody interaction

Fig. 6
figure 6

Structure comparison of Arg512 and Ala512 of HEV E2s-I. Arg512 is shown in green, and Ala512 in pink. Residues from the Fab are shown in light blue. Hydrogen bonding contact are displayed as a red dashed line. The Ala512 substitution causes a loss of two hydrogen bonds directed to the 8C11 Fab light chain

Comparatively, alanine scanning mutagenesis simulation of the E2s-IV:8G12 complex showed four main destabilizing effects that boost the interaction energy above 1 kcal/mol: Glu549 (2.03 kcal/mol), Lys554 (1.54 kcal/mol), Gly589 (1.12 kcal/mol) and Pro592 (1.61 kcal/mol). In the saturation mutation simulation, the four sites also showed higher energy changes (>1 kcal/mol) when mutated to any of the other 19 residues (Fig. 7). Except for Gly589, this high level of mutational energy is presumed to be mainly due to the loss of hydrogen bonding contacts and van der Waals interactions. The interaction analysis of the E2s-IV:8G12 interface also helped to elucidate how the epitope will be perturbed by mutations due to energy variation (Fig. 3).

Fig. 7
figure 7

Mutation energy landscape of in silico saturation mutation on the E2s-IV:8G12 interface. a Mutation energy landscape of in silico single-point, saturation mutations performed on interface residues in the E2s-IV:8G12 complex. b Critical residues with alanine mutation energy greater than 1 kcal/mol are shown in the red column. The 1 kcal/mol cutoff line is indicated in red, and alanine mutations are highlighted in red. Congruence between the mutation energy assay and the experimental mutations in our previous study is denoted as follows: “+”, consistent; “–”, inconsistent

In the p24:A10F9 simulation, five mutations showed significant variations in energy: Asp197 (1.24 kcal/mol), Ile201 (2.37 kcal/mol), Arg203 (3.47 kcal/mol), Leu205 (1.36 kcal/mol) and Pro207 (1.93 kcal/mol). Data from saturated mutagenesis confirmed that these five residues are important and irreplaceable, as each mutation was above the 1 kcal/mol cutoff (Fig. 8). Most of these free energy increases above 1 kcal/mol were also induced by saturation mutations at these sites.

Fig. 8
figure 8

Mutation energy landscape of in silico saturation mutation on the HIV p24:A10F9 interface. a Single-point, saturation mutations calculations were performed on interface residues in the p24:A10F9 complex. b Critical residues with alanine mutation energy greater than 1 kcal/mol are shown in the red column. The 1 kcal/mol cutoff line is indicated in red, and alanine mutations are highlighted in red. Congruence between the mutation energy assay and the experimental mutations in our previous study is denoted as follows: “+”, consistent; “N/A” indicates no available experimental data

3.3 Threshold for Key Epitope Residues

The Discovery Studio guideline suggests a threshold of 0.5 kcal/mol for defining the destabilizing or stabilizing effect [26]. In our experience here, a value of 0.5 kcal/mol provides misleading results, and would indicate that most of the residues are critical at the interface. Increasing this threshold to 1.0 kcal/mol gave a more accurate indication of the residues that were significantly contributing to the interaction. Although another physical model for binding energy hot spots was also developed and suggested a similar cutoff value based CHARMM force field [13], the method employ different energy terms while calculation and different test models with respect to ours. In our opinion, a cutoff of 1.0 kcal/mol should be regarded as the energy barrier to hamper an antigen–antibody interaction, and is likely to be applicable for other types of protein–protein interactions. Despite this, residues with minor free energy variations after mutation were considered less-engaged in the epitope recognition. Overall, our findings show that saturation mutations can be used to identify key residues or unknown mutants between specified proteins, quickly and effectively to the same extent or better than the routinely used alanine scanning in vitro mutagenesis assays. Furthermore, this technique is useful for both co-crystal structure and simulated modeling analyses for rational drug design.

4 Discussion

Acute hepatitis E remains a major public health concern, particularly in developing countries, and it appears that prophylactic vaccines are likely to be required to prevent HEV infection [27, 28]. HIV is also an ongoing global concern. In our previous work, we interrogated the binding sites of three immune complexes pertaining to vaccine development for these two viruses using experimental alanine scanning mutagenesis and crystallization [16, 17, 22]. Here, we employed in silico alanine scanning mutagenesis simulations to confirm the essential epitope residues involved in these three complexes to further explore the opportunities available for the timely and cost-effective strategic evaluation of rational drug design.

In the E2s-I:8C11 model, only one candidate residue, Arg512, was identified as a critical site for structural stability by in silico mutation calculations, which agrees with our previous experimental findings [16]. In the E2s-IV:8G12 model, four critical residues—Glu549, Lys554, Gly589 and Pro592—showed elevated levels of free energy. The saturated mutational scanning on the p24:A10F9 complex indicated the importance of five epitope residues, Asp197, Ile201, Arg203, Leu205 and Pro207. Of these, Asp197, Arg203 and Leu205 have been verified previously using experimental EC50 calculations and sigmoidal trend fitting [22]; the additional two residues were not included in the previous binding assay, and further tests are warranted.

The ‘Calculate Mutation Energy (binding)’ protocol integrated in the Discovery Studio software provides a pH-dependent mode and a temperature-dependent mode for different applications, which needs to be set independently. We found that the pH-dependent mode yielded a higher number of residues than the temperature-dependent mode, with the exception of Gly591, which was not indicated to be important in the E2s-IV:8G12 model. Through careful examination of the structure, we found that Gly591 lies in a same loop formed by residues Gly589 to Pro592, and, given its distance from the complementary region of the antibody, would have been predicted to contribute less to binding. In addition, the molecular interaction analysis based on CHARMm force field also suggested fewer substantial interactions between Gly591 and the Fab of 8G12 as compared with other important residues. Comparing the structure of the mutational model at Gly591 with the original wild type model, we note only a slight variation in the local structure. Yet, according to previous reports, a mutation to Gly591 induces significant changes in experimental binding, which is likely to indicate a drastic alteration in the local structure [17]. Indeed, a glycine to alanine mutation might shift the main chain direction and even cause severe changes in local structure. Therefore, we presume that the in silico mutation at Gly591 failed to generate a reasonable local structure and this may have lowered the accuracy of the mutational energy calculation and thus the indicated significance of this residue in the binding. This hypothesis is supported by structure elucidation, as Gly and Pro show differences in flexibility or induce systematic conformational changes when substituted with other types of residues [17].

Despite this discrepancy, overall, the proposed empirical cutoff value successfully predicted most of the key residues of the epitope for antibody recognition in line with those determined using high-resolution complex crystal structures. The only missing site was likely due to insufficient optimization of the mutation structure, which might be overcome by additional energy minimization or dynamic simulation. Our study, thus, presents a reliable computational method for the initial screening of epitope residues before traditional mutation experiments, which would help to facilitate fundamental research on epitopes and vaccine design. Although the proposed cutoff value was in good agreement with the results obtained in previous alanine scanning in vitro experiments, additional studies with other known complexes should be employed in future experiments to confirm the validity of the assay and this proposed cutoff value. In addition to predicting key epitope residues in antigen–antibody interactions, this method could also be extended to other types of protein–protein interactions, with point mutations also aiding in complex determination. Other than focusing on the destabilizing effect induced by a mutation, which abolishes protein–protein binding, the stabilizing effect achieved as a consequence of increasing binding affinity also deserves more attention, particularly in terms of improving the affinity of engineered antibodies [29, 30]. This type of information would provide an alternative computational approach in guided affinity maturation experiments. Prediction of stabilizing mutations was also useful in rational, structure-based design of proteins to generate well-packed conformations [31]. Computational screening of the CDR through mutational trials before in vitro mutations are carried out would also help to circumvent the need for complicated traditional library methods, and one could expect that less time would be required to obtain antibodies with improved affinities. Although the cutoff value used here for the antigen–antibody analysis needs further exploration, this technique seems to provide an effective tool for epitope determination.