Introduction

The 5′ terminal structure of eukaryotic RNA polymerase II transcripts (RNA 5′ cap) plays a crucial role in gene expression and regulation. The cap is specifically bound to several cellular and viral proteins, including various isoforms of eukaryotic translation factor eIF4E [1], nuclear cap-binding complex CBC [2], DcpS scavenger enzyme [3], poly(A) binding protein PABP [4], poly(A)-specific ribonuclease PARN [5], pokeweed antiviral protein PAP [6], cellular mRNA cap (guanine-N7) methyltransferase [7], human parneoplastic encephalomyeltis antigen HuD [8], vaccinia virus 2′-O-methyltransferase VP39 [9], influenza virus RNA polymerase [10], and dimethyltransferase TGS1 [11]. The eIF4E factors from vertebrates and yeast were shown to be highly selective for 7-methylguanosine cap (MMG-cap) [12], m7GpppN, N=G, A, U or C. In the nematodes Caenorhabditis elegans and Ascaris suum as well as in the parasitic flatworm Schistosoma mansoni a high population of messenger ribonucleic acids (mRNAs) contain a hypermethylated cap form, N2,N2,7-trimethylguanosine cap (TMG-cap), m 2,2,73 GpppN, which is acquired along with a spliced leader during trans-splicing of pre-mRNA [13]. Affinity chromatography [14, 15], and fluorescence titration [16, 17] experiments showed that three out of five C. elegans eIF4E isoforms, IFE-1, IFE-2, IFE-5, are capable of binding specifically to the MMG-cap and to the TMG-cap. Two other isoforms, IFE-3, most similar to mammalian eIF4E, and IFE-4, related to the mammalian 4E-homologous protein 4E-HP, bind only to the MMG-cap. The dual binding specificity was also observed for eIF4Es from S. mansoni [18] and A. suum [19]. Values of the equilibrium dissociation constants, Kd, for some selected complexes of the dual specificity eIF4Es with typical MMG-cap and TMG-cap analogues are shown in Table 1. The Kd values for murine eIF4E are also shown for comparison. The values reported by various groups and derived by various titration methods, mainly fluorescence and isothermal calorimetry, can differ between each other. Nevertheless, the experimental data clearly show that both C. elegans IFE-3 and murine eIF4E strongly discriminate in favor of MMG-cap, ca. 11 kJ mol−1 and 16 kJ mol−1, respectively, while in the case of C. elegans IFE-5 the binding energies are similar. The energy difference between S. mansoni eIF4E complexes is ca. 5 to 7 kJ mol−1, depending on the titration experiment [18].

Table 1 Experimentally derived dissociation constants, Kd [μM], for the MMG-and TMG-cap analogues from C. elegans factors, IFE-3 and IFE-5, and S. mansoni eIF4E. The Kd values for murine eIF4E are shown for comparison

The TMG-cap occurs at the 5′ terminus of small nuclear RNA (snRNA), small nucleolar RNA (snoRNA) and in telomerase RNA TLC1 [11]. It is specifically recognized by Snurportin1 [20], a receptor for spliceosomal small nuclear particles (snRNPs).

As shown by X-ray crystallography [9, 12, 2129] and multidimensional NMR [30] most of the cap-binding proteins converged at a common mechanism of the cap recognition via stacking of the 7-methylguanine moiety in between two aromatic amino acid side chains. The 7-methylguanine base possesses a net positive charge, which seems indispensable for its proper recognition, i.e., the 7-methylguanosine cannot be replaced by guanosine in the cap structure. In the snurportin1 complex with the dinucleotide TMG-cap analogue [20] the sandwich stacking involves one tryptophan, and two bases of the cap, the first one, trimethylated, and the second one, unmethylated. In dimethyltransferase TGS1 the 7-methylguanine moiety is stacked only on one tryptophan and a serine polar side chain limits the binding pocket on the opposite side [11]. Except for the cation-π stacking, the cap is also stabilized by a network of hydrogen bonds, direct or water mediated salt bridges, as well as less specific van der Waals and hydrophobic contacts. Only a few exceptions have been found yet where the recognition specificity is entirely mediated through hydrogen bonds and van der Waals contacts with 7-methylguanine, i.e., cap methyltransferase [7], and reovirus polymerase λ3 [31].

In the mammalian, plant, and yeast eIF4E-cap complexes, two tryptophan aromatic residues take part in the sandwich cation-π stacking with the 7-methylguanine moiety. Two hydrogen bonds involve N1H and N2H atoms of 7-methylguanosine and the carboxyl group of a conserved glutamic acid, and one hydrogen bond is observed between O6 of m7G and the backbone amide nitrogen. Additional stabilizing interactions are between the phosphate chain of the cap and the arginine/lysine side chains of the protein. The first crystallographic structure of dual specificity eIF4E from Schistosoma mansoni in the complexes with two dinucleotide MMG-cap analogues [18] showed a similar binding mode as for the single specificity eIF4Es. The only difference seems to be in the conformation of E90 carboxylate of S. mansoni eIF4E that is rotated by ∼80° in comparison to the orientation of the equivalent E103 [12, 23] in murine eIF4E. This precludes the formation of two strong hydrogen bonds with the 7-methylguanine moiety. A similar rotation was also observed in the human eIF4E ternary complex with eIF4GI peptide and a glycerol molecule located in the cap-binding center [32]. Still, the contribution of that conformational change to MMG-cap vs. TMG-cap binding specificity remains unclear. The NMR analysis of the MMG-cap and TMG-cap complexes with S. mansoni eIF4E [18] showed substantial chemical shift perturbation for ca. 15 amino acids, most of them distributed around the cap-binding pocket. Based on the crystallographic and NMR data the authors suggested that intrinsic and specific conformational flexibility of the S. mansoni eIF4E plays a crucial role in the TMG-cap binding, analogous to an “induced fit” mechanism. On the contrary, combined mutagenesis studies and molecular dynamics simulations of C. elegans dual specificity IFE-5 led to a “structural” rather than a “dynamic” model. Larger width and depth of the cap-binding pocket was postulated to be responsible for the TMG-cap binding specificity [17]. Replacement of two amino acids, N64Y/V65L decreased the size of the pocket and gave rise to discrimination against TMG-cap by steric hindrance. However, it was noted that dual specific A. suum eIF4E does contain Y64 and L65 residues [18]. Unfortunately, discrimination between TMG-cap and MMG-cap by snurportin1 is based on a mechanism that differs from that expected for the eIF4E factors [20, 33].

In order to get an insight into the mechanism of dual specificity in the cap recognition by some of highly homologic eIF4Es, we performed long-lasting molecular dynamics (MD) simulations in water for three selected eIF4E homologues, murine eIF4E as well as C. elegans IFE-3 and IFE-5, each of them in the apo form and in the complexes with m7GDP or with m 2,2,73 GDP (Scheme 1). The results point to a dynamic mechanism of discrimination between the mono-and hypermethylated cap structures.

Scheme 1
scheme 1

Chemical structures of the cap analogues. (a) m7GDP, analogue of MMG-cap, (b) m 2,2,73 GDP, analogue of TMG-cap

Theory and methods

Initial setup

The starting structure of the complex of truncated murine eIF4E(28–217) with m7GDP for molecular dynamics simulations was taken from crystallography (PDB code: 1EJ1; [23]). The missing atoms of some of the amino acid side chains were completed by SCWRL [34]. The hydrogen atoms were added in Insight II (Accelrys Software Inc., U.S.A.). The starting structures of IFE isoforms were obtained by homology modeling with murine eIF4E(28–217) bound to m7GDP as a template, with 51% and 42% sequence identity to IFE-3 and IFE-5, respectively. The multiple sequence alignment was performed by CLUSTAL W [35]. Ten structures for each isoform were obtained using the program MODELLER [36]. Additional harmonic constraints were introduced for the distances between the protein and the cap atoms that were engaged in hydrogen bonds, salt bridges and van der Waals contacts. Subsequently, the resulting structures were subjected to detailed analysis regarding packing of the residues, steric hindrance, and loop conformations. Based on the analysis one representative structure was chosen for each isoform. Due to high sequence homology the modeled IFE structures were very similar to that of the eIF4E template (Fig. 1), especially regarding the main polypeptide chains. The structures of the complexes of three 4E factors with m 2,2,73 GDP, and of murine eIF4E with GDP, were constructed by adding two methyl groups at N2 and by removing the methyl group from N7 in the protein-m7GDP complexes, respectively. The apo proteins were obtained by removing m7GDP from the complexes.

Fig. 1
figure 1

Structural comparison of murine eIF4E(28–217) and its two C. elegans isoforms, IFE-3 and IFE-5. (a) Sequence alignment. (b) Superposition of the three protein complexes with m7GDP. The cap and the amino acids engaged in its stabilizing are marked in bold

The ESP charges of the isolated ligands were calculated at HF/6–31G(d,p) level using Gaussian 94 (Gaussian Inc., Pittsburg PA, U.S.A.)

Molecular dynamics simulation and analysis

The MD simulations were carried out by the program Sigma [37] using CHARMM22 force field [38]. Each protein or complex was subjected to energy minimization without electrostatic interaction and immersed in an equilibrated TIP3P water box [39] keeping at least 10Ǻ shell thickness from the protein surface. The simulation procedure consisted of several equilibration MD runs preceded and followed by 500-step energy minimization, and a subsequent regular MD run, as follows. First, energy minimization and 48 ps dynamics of water molecules was performed keeping the protein or the complex immobilized. Second, energy minimization of the protein or of the complex with the water molecules kept fixed was followed by stepwise heating of the whole system from 50 K to 300 K for 82.56 ps. The initial velocities at each temperature were taken from Maxwell-Boltzmann distribution. The equilibration MD runs were performed in the nVT ensemble and the regular MD simulations were performed in the npT ensemble [40] at temperature T = 300 K and pressure p = 1 atm. The SHAKE algorithm [41] was applied to constrain the bonds. The electrostatic interactions were calculated by multiple time step [42] with a double cut-off at 6 Å and 10 Å. Short-, middle-, and long range interactions, according to particle-mesh Ewald method [43, 44], were calculated for an integration time-step of 2, 4, and 12 fs, respectively. The simulations were run for 5 ns in the case of the apo proteins or for 10 ns in the case of the complexes, in order to reach at least partial equilibrium according to a stability criterion for the fluctuations of root-mean-square-deviation (RMSD) of the proteins’ Cα atoms.

The conformations of the solute on a simulated trajectory were written down every 0.96 ps and analyzed regarding interatomic distances and torsion angles. Essential dynamics (ED) analysis of selected, equilibrium parts of the MD trajectories was performed according to Amadei et al. [45].

Results

An experimentally observed equilibrium association constant Kas expressed in terms of the molar concentrations of the reactants in a protein-ligand association is related to the standard Gibbs free energy ΔG° of the association process at temperature T, \( \Delta {\hbox{G}}^\circ = {\hbox{RTln}}{{\hbox{K}}_{\rm{as}}} \). Hence Kas is a quantitative measure of the ligand affinity for the protein. Comparison of the Kas values for a series of structurally modified cap analogues enabled parsing of ΔG° into separate contributions from various stabilizing contacts inside the eIF4E cap-binding pocket [12]. Bearing in mind an approximate character of the approach due to lack of additivity of the entropic terms [46, 47], combination of the crystallographic structure with such ΔG° analysis provided molecular mechanism of specific binding between the cap and eIF4E [12]. However, applying the procedure to detect the discrimination mechanism between MMG-and TMG-cap by some eIF4Es [17, 18] has failed. The structures of IFE isoforms derived by homology modeling were very similar to that of the eIF4E template due to high sequence homology (Fig. 1). Therefore, the structural differences of potential importance for the MMG-vs. TMG-cap binding selectivity have not been identified. This prompted us to evaluate a discrimination mechanism of a dynamic type. The equilibrium association constant, \( {{\hbox{K}}_{\rm{as}}} = {{\hbox{k}}_{ + {1}}}/{{\hbox{k}}_{ - {1}}} \), depends on the ligand ability to form and leave the complex, expressed by kinetic rate constants k+1 and k-1, respectively. Higher k+1 and/or lower k-1 values give rise to an increase of Kas. Since it was impossible to calculate theoretically the rate constants from all-atom MD simulations, we assumed that the MD analysis of the apo proteins provided some information on k+1 that reflects accessibility of the MMG-cap analogue and of the TMG-cap analogue for the binding sites of the three eIF4Es. Similarly, the MD analysis of the three factors, each bound to either MMG-cap or TMG-cap, might provide some hints on the stability of the complexes that influences their dissociation kinetic constants k-1. It should be emphasized that the simulated apo structures were generated by removing the ligand, and therefore they differ from those derived experimentally, at least in the case of the NMR structure of murine apo-eIF4E [48]. The simulated apo structures may be regarded as those, which approximate final conformations the ligands would dock into, especially if the cap phosphate chain anchores first as postulated by a two-state model of the cap-binding [12] (see below). It is also worth noting, that the large structural fluctuations observed in the NMR structure of apo-eIF4E do not occur during our MD simulations.

The MD trajectories of the apo eIF4Es display enhanced flexibility of the loops around the entrance to the cap-binding pocket (Fig. 2), especially S1–S2 and S7–S8 loops, while the secondary structure elements remained unchanged. This general view of the dynamic behavior is confirmed by the experimental data derived for the apo form of human eIF4E by multidimensional NMR [48], for the cap-free human eIF4E in the complex with glycerol that seems to represent an intermediate state between the apo and the cap-bound form [32], and for the murine factor by hydrogen-deuterium exchange combined with electrospray mass spectrometry [49]. The secondary elements were preserved in apo eIF4E while the loops exhibited mobility on the ns-ps time scale that became abrogated upon the cap binding. The structural differences in the regions of loops S1–S2, S3–S4, S5–S6, and S7–S8 (Fig. 2a) resulted in the formation of the positively charged pocket to anchor the cap phosphate chain (loops S1–S2 and S7–S8), and the formation of the stacking triad and hydrogen bonding with the 7-methylguanine moiety via locking the W56 hinge (loop S1–S2) and rotating W102 (loop S3–S4) into the cap-binding site. The ability of W102 containing loop S3–S4 to rotate from the open (apo) state into the closed form was confirmed by both crystallography [32] and NMR studies [48]. Moreover, an alternative orientation of W102 indole ring was found by the diffraction studies of the eIF4E complexes with 7-benzyl-GMP and 7-(p-fluorobenzyl)-GMP. The W102 residue flips through 180° [50] in comparison with those complexes which contain the 7-methylguanosine caps [12, 23, 24]. All those observations agree with the two-state model of the cap-binding [12], in which anchoring of the phosphate chain was followed by a cooperative formation of the stacking triad and hydrogen bonds. This model, derived from analysis of the fluorescence titration of eIF4E with structurally modified cap analogues, was confirmed by kinetic stopped flow experiments [51, 52]. Contrary to our analysis, Slepenkov et al. [53] performed and analyzed the stopped flow experiments differently, and postulated a one-step mechanism. However, our results are confirmed not only by the standard analysis of the stop-flowed kinetic traces under pseudo first-order conditions [51] but also by running the experiments under the second-order conditions combined with numerical integration of the suitable differential kinetic equations [52] by the state-of-the-art program DynaFit [54].

Fig. 2
figure 2

Analysis of the cap accessibility into the binding pockets. (a) Location of the flexible loops in the eIF4E structure. (b) Time course of the distances between, (upper) S7–S8 and S5–S6 loops, measured for Cα atoms of S209 and K159 for apo-eIF4E and apo-IFE-3, and for Cα atoms of Q217 and K159 for apo-IFE-5, (lower) S5–S6 and S1–S2 loops, measured for Cα atoms of K159 and R52. The lines are marked as follows: apo-eIF4E dotted, apo-IFE-3 dashed, apo-IFE-5 solid

The flexibility of the loops seem to be crucial for the discrimination between m7GDP and m 2,2,73 GDP by murine eIF4E and C. elegans IFE-3 and IFE-5. The calculated distances between S7–S8 and S5–S6, and between S5–S6 and S1–S2, on the final parts of the MD trajectories (Fig. 2b) are ca. 10 Å greater for IFE-5 that binds the TMG-cap than for IFE-3 and murine eIF4E that are specific for the MMG-cap only. Hence, the TMG-analogue with two additional methyl groups can easy penetrate the IFE-5 binding pocket contrary to other factors.

A similar analysis of the dynamics of the protein-cap complexes shows stable contacts between the cap phosphates and the arginine/lysine side chains, irrespective of the bound analogue. This is consistent with the anchoring character of the phosphate groups in the cap stabilization inside the eIF4E binding slot [12]. On the contrary, the mutual orientations of the rings in the cation-π stacking triad undergo larger fluctuations in respect to the starting structure. As expected, the largest changes are observed in the eIF4E-GDP complex that is stabilized by weaker π-π stacking, i.e., a perpendicular orientation of the W56 and G rings and a shift of W102 deeper into the binding pocket. In the case of all the other complexes the stacking triad is kept principally unchanged, with relatively larger fluctuations of W102. A temporary increase of the distance between W102 and m7G rings are observed in the eIF4E-m 2,2,73 GDP and IFE-3-m 2,2,73 GDP, but not in IFE-5-m 2,2,73 GDP, complexes. On the other hand, the hydrogen bond between the N2-amino group of m7G and the E103 carboxylate (Fig. 3) is being broken and reformed due to shifts of the E103 side chain into the bulk solvent. Inherent “plasticity” of that part of the cap-binding center, resulting in a rotation of E103 side chain and a movement of W102 side chain, was also observed in the cap-free eIF4E ternary complex with eIF4GI peptide and glycerol in the cap-binding pocket [32]. Therefore, the low affinity of the TMG-cap for murine eIF4E cannot be explained by lack of that hydrogen bond due to the hypermethylation.

Fig. 3
figure 3

Time dependence of the distance (in Å) between N2 of m7GDP and Glu103 side chain carboxyl in murine eIF4E (dotted), IFE-3 (dashed), and IFE-5 (solid)

The distance between loops S7–S8 and S5–S6 in the eIF4E-cap complexes (Fig. 4) is shown to correlate with the affinity of various cap types for the eIF4E isoforms. Binding of m7GDP to murine eIF4E results in bend of S7–S8 toward S5–S6 that might help to keep the ligand in the binding centre by water mediated interaction between K159 and, e.g., S209. As shown previously [55] K159 is very important for binding the capped mRNA. Closing the entrance to the cap-binding pocket by decreasing the distance between S209 and K159 was also observed in MD simulations of phosphorylated eIF4E [25]. In the complexes with the low affinity ligands, m 2,2,73 GDP and GDP, the distance between the loops is larger, thus making it easier for those ligands to leave the complexes. Similarly, the distance between the loops in the IFE-3 complex with m7GDP is much smaller than in the IFE-3 complex with m 2,2,73 GDP, while for both IFE-5 complexes it is kept fairly large, irrespective of the ligand. The analysis of the overall dynamics of the cap-bound eIF4E isoforms was also carried out by ED analysis of the covariance matrix of the atomic displacements [45]. The scalar products of the the vectors representing the normalized Cα displacements and the eigenvectors corresponding to the largest eigenvalue (λ = 1) show non-Gaussian distributions with 2–3 maxima. This can be interpreted as correlated, long-range movements, in which the loops oscillate around several mean positions (Fig. 5). The histograms for the scalar products of the eigenvectors corresponding to lower eigenvalues (λ = 10) are Gaussian, as expected for equilibrated, independent and harmonic motions.

Fig. 4
figure 4

Analysis of the cap propensity to leave the cap-binding pocket. Time course of the distances between S7–S8 and S5–S6 loops, measured for Cα atoms of S209 and K159 (IFE-3, eIF4E), and for Cα atoms of Q217 and K159 (IFE-5). (a) Murine eIF4E bound to m7GDP (solid), to m 2,2,73 GDP (dashed), and to GDP (dotted). (b) IFE-3 bound to m7GDP (solid bold) and to m 2,2,73 GDP (dashed), and IFE-5 bound to m7GDP (solid) and to m 2,2,73 GDP (dotted)

Fig. 5
figure 5figure 5

Mobility of the apo-and cap-bound eIF4Es by essential dynamics analysis. Motions along the first (λ = 1), fifth (λ = 5), and tenth (λ = 10) eigenvectors obtained from the Cα coordinates covariance matrix, and the corresponding probability distribution for the displacements (nm), (a) eIF4E bound to m7GDP, (b) eIF4E bound to m 2,2,73 GDP, (c) IFE-3 bound to m7GDP, (d) IFE-3 bound to m 2,2,73 GDP, (e) IFE-5 bound to m7GDP, (f) IFE-5 bound to m 2,2,73 GDP

Discussion

The knowledge of the molecular basis of the RNA 5′ cap structure recognition by the cap-binding proteins is a prerequisite for understanding possible mechanisms of the cap functioning in various types of the gene expression processes in eukaryotes, such as translation initiation, mRNA splicing, and export of RNAs to the cytoplasm. It seems that various evolutionary unrelated cap-binding proteins converged on a similar general mechanism of the cap recognition. Subtle modifications of the general recognition mechanism of the cap may lead to differences in the protein functions, e.g., the diverse role of two aromatic amino acids that stack with the 7-methylguanosine moiety [5658]. The methyl group at N7 in the cap structure imparts a net positive charge to the guanine ring, and results in more efficient stacking compared with the unmethylated base. Quantum mechanical calculations showed that a typical energy of the cation-π stacking of the m7G base in the complex with the tyrosine or tryptophan aromatic ring is in a range −11.4 kcal mol−1 to−16.23 kcal mol−1, while the G/Y π-π stacking energy is ca.−6 kcal mol−1 [59]. Additional methyl groups at N2 do not change the stacking ability of the cap. The stacking energy of m 2,2,73 G/W276 in Snurportin 1, −12.52 kcal mol−1, is close to typical values obtained for the m7G/W complexes.

The presence of two methyl substituents in the amino group of 7-methylguanosine brakes at least one stabilizing hydrogen bond in the cap-binding protein pockets and may lead to a substantial decrease of the association constants observed for the complexes with TMG-cap. On the other hand, the dual specificity cap-binding proteins possess high homology and structure similarity to those that discriminate between the MMG-cap and the TMG-cap. Hence, it is a great challenge to conceive a molecular model of the discrimination vs. dual specificity for the protein-cap association. The explanations usually do not go beyond formulations like “the differences in the size of the cap-binding pocket in the C. elegans isoforms of eIF4E” [20]. Our approach to elucidate the specificity of the caps recognition can be expressed in terms of the ligand ability to enter or leave the apo protein binding pocket, since the equilibrium association constant is determined by the ratio of the two rate constants. Although we were not able to calculate the values of the rate constants by all-atom MD simulations, the comparative analysis of the MD trajectories of the apo- and cap-bound factors provides a more detailed explanation for the differences in the binding specificity of two C. elegans eIF4E isoforms, IFE-3 and IFE-5, than those published hitherto. The dynamic mechanism of the discrimination between two types of the cap may be ascribed to differences in mobility of the loops around the entrance to the protein cap-binding pockets, especially S7–S8 loop. Our results show also higher rigidity of the cation-π stacking triad and of the stabilizing interactions (salt bridges, hydrogen bonds) involving the cap phosphate chain compared with a more flexible character of the hydrogen bonds. The results of our computer modeling are generally consistent with the experimental, structural and dynamic, data [32, 48, 49].

Conclusions

Discrimination between two types of the cap, MMG-cap and TMG-cap, consist neither in the differences in the stacking energy nor in well-defined structural differences inside the cap-binding pocket. Both 7-methylguanosine and its hypermethylated form were found to stack equally well in between two amino acid aromatic rings [59], and the structure of S. mansoni eIF4E in complex with the MMG-cap analogues [18], showed a very similar mode of binding to that of the single specificity eIF4Es. MD simulations based on the known structures of the cap-eIF4E complexes provided means to evaluate the discrimination mechanism. Contrary to the comparative analysis of the “static” net of stabilizing contacts inside the cap-binding pockets of highly homologic eIF4Es, we took into account the differences in the dynamics of the formation and dissociation of the eIF4E-cap complexes. An exact specification of the role of particular amino acids in the proposed dynamical mechanism, e.g., their mutual interactions and/or their interactions with various cap structures, needs further investigation.