1 Introduction

Proteins constitute Nature’s toolbox that, among other functions, transport and digest compounds, provide structural integrity, regulate chemical processes or serve in immune responses in a cell [1, 2]. It is striking to know that most of these functions are encoded by a set of 20 different amino acids that could fold into a specific three-dimensional structure, which defines the function of the folded protein [2]. The necessary effort to determine protein structures on an atomistic level is a challenging task, while observing the dynamics of protein folding with a similar accuracy is yet to be achieved [2]. Protein folding has interested researchers for decades as it is closely related to the fundamental understanding of protein biosynthesis and functionality [3,4,5,6].

Modern computational tools allow to perform atomistic simulations on the scale of hundreds of thousands of atoms and beyond [7,8,9,10,11,12]. However, the folding process of even the smallest proteins typically requires simulations on the order of hundreds of microseconds [13,14,15,16]. These timescales are feasible with specialised computational infrastructure for atomistic molecular dynamics (MD) [17], but timescales of few microseconds are much more common [7,8,9,10,11,12, 18,19,20,21]. Computational architectures are thus still largely insufficient to perform all-atom protein folding simulations of arbitrary size routinely.

Fig. 1
figure 1

A Illustration of the \(\hbox {A}_{11}\) peptide in a water box with some helical residues at the C-terminal and a coil structure at the N-terminal. Carbon, nitrogen, oxygen and hydrogen atoms are coloured in cyan, blue, red and light-grey, respectively. B Zoom-in on the peptide in A omitting water for clarity. C Definition of the backbone dihedral angles \(\phi \) (purple) and \(\psi \) (orange) of residue i inside an alanine polypeptide. The backbone atoms that determine these dihedrals are shown as larger spheres and are labelled. Atoms are coloured like in A

The problem of protein folding is conceptually related to the studies of conformational transitions in short polypeptide chains [4, 5, 22,23,24], and the dynamics of polypeptide chains can be described computationally at the relevant spatial and temporal scales. Several earlier studies employed ab initio and classical mechanics methods to investigate properties and dynamics of polypeptides in vacuum [22, 24,25,26,27,28]. It was shown that the backbone dihedral angles \(\phi \) and \(\psi \) (see Fig. 1C), the so-called twisting degrees of freedom, are the internal coordinates, which define the energy landscape of a polypeptide chain and permit describing the folding \(\leftrightarrow \) unfolding phase transition in polypeptides of varied length. The importance of backbone dihedral angles for the folding process is consistent with experimental data as the backbone dihedral angles are typically used to differentiate between a variety of structural conformations, i.e. the secondary structure motifs, in peptides and proteins [29,30,31].

The present investigation employs an extensive all-atom MD approach to study the folding \(\leftrightarrow \) unfolding process in alanine polypeptides (see Fig. 1A) in explicit solvent. The results permit analysing the internal dynamics of the polypeptides including the temporal evolution of the backbone dihedral angles, helix-propagation and helix-stability, which provides atomistic insights into the helix-formation process highlighting a difference in the helix-propagation dynamics towards their N- and C-termini. Using steered MD, we have established the potential energy surface for an individual residue in a solvated alanine polypeptide. This potential energy surface was used to explain the observed polypeptide backbone dynamics and was employed for a statistical mechanics description of the temperature-dependent behaviour of the polypeptide’s helicity for different peptide lengths.

2 Methods

MD simulations were performed using NAMD [7, 32] utilising the CHARMM36 force field with CMAP corrections for proteins [33,34,35] including explicit water solvent in a TIP3P model. Simulation preparation and analysis were done using VMD [36], where STRIDE [37] was used for the assessment of secondary structures of the polypeptides. The peptides were constructed using Pep McConst [38], interfaced through the online-platform VIKING [39].

2.1 Isobaric molecular dynamics of the \(\hbox {A}_{11}\) peptide

A peptide consisting of 11 alanine residues (A\(_{11}\)) was used to quantify the folding \(\leftrightarrow \) unfolding dynamics. It was constructed using Pep McConst [38] in a pure right-handed \(\alpha \)-helix conformation and solvated in a cubic water box spanning 50 Å in each direction adding up to a total of 11,661 atoms (see Fig. 1A). Periodic boundary conditions were imposed, and no ions were added corresponding to a salt concentration of 0 mol/l. The simulations were carried out using NAMD [7, 32] at a constant pressure of 1 bar using the Nosé–Hoover–Langevin piston pressure control [40, 41] with a period of 200 fs and a decay time of 50 fs. For the initial isobaric equilibration simulation of the system, the temperature was set at 300 K and controlled through the Langevin thermostat with a damping coefficient of 5 ps\(^{-1}\). A timestep of 1 fs was used to integrate the equations of motion for all atoms in the system. The non-bonded interactions with a smooth switching starting at 10 Å were cut off at 12 Å. Long-range Coulomb interactions were excluded to avoid artefacts from periodic boundary conditions for small simulation cells [42], especially in the case of the potential energy surface calculation.

The system was equilibrated for 250 ns. The equilibrated system was further simulated for 250 ns at 300 K to sample the data for analysis. Thereafter, the temperature of the system was increased by 2 K using the Langevin thermostat. The step-wise increase in temperature was repeated every 250 ns until the temperature of the system reached 350 K, resulting in 26 simulations at different temperature values and a total simulation time of 6.5 \(\upmu \)s.

2.2 Potential energy surface calculations

The potential energy surface was computed for a central residue inside an \(\hbox {A}_{5}\) polypeptide constructed using PepMcConst [38]. The \(\hbox {A}_{5}\) peptide was solvated in a cubic water box spanning 30 Å in each direction, resulting in a system containing 2930 atoms. Prior to the MD simulation, the system was minimised for 2000 conjugate gradient steps. The temperature in the simulation was set to 310 K, while all other parameters were identical to those from the simulation described above. To construct the potential energy landscape of a residue inside a polypeptide, the simulations were performed with constrained values of the dihedral angles \(\phi \) and \(\psi \) (see Fig. 1C) for all residues of the \(\hbox {A}_{5}\) peptide. By constraining all backbone dihedral angles, the computed energy surface describes the residue-specific interactions in a segment of five or more residues with identical backbone dihedral angles. The potential energy surface for a residue was then computed as a function of these angles. Test calculations showed that five residues need to be constrained in order to produce potential energy surfaces, which match the known statistical distributions of the backbone dihedral angles. The dihedral angles were constrained using a harmonic potential of the form:

$$\begin{aligned} U = k \left( \sum _{i=2}^{5} \left( \phi _i - \phi _\mathrm{{ref}} \right) ^2 +\sum _{i=1}^{4} \left( \psi _i - \psi _\mathrm{{ref}} \right) ^2 \right) , \end{aligned}$$
(1)

where the force constant k was assumed 200 kcal/mol in order to tightly control the dihedral angles in the simulation. The summations in Eq. (1) are performed over the residues in the peptide; (\(\phi _i\),\(\psi _i\)) denote the dihedral angle values for the ith residue (see Fig. 1C). The summations account for the fact that \(\phi \) is not defined for the first residue (N-terminal residue), while \(\psi \) is not defined for the last residue (C-terminal residue) in the polypeptide. The reference dihedral angles (\(\phi _\mathrm{{ref}}\),\(\psi _\mathrm{{ref}}\)) were initially set to be \(\phi _\mathrm{{ref}}=-60^{\circ }\) and \(\psi _\mathrm{{ref}}=-60^{\circ } \) for the equilibration simulation, which was carried out for 10 ns in the isobaric ensemble and another 10 ns in the isochoric ensemble.

To obtain the potential energy surface of a residue inside the \(\hbox {A}_{5}\) peptide, the reference dihedral angles \(\phi _\mathrm{{ref}}\) and \(\psi _\mathrm{{ref}}\) were incrementally and independently altered by 4\(^{\circ }\) every 2 ns. First, the reference dihedral angle \(\psi _\mathrm{{ref}}\) was changed, while \(\phi _\mathrm{{ref}}\) was kept fixed at \(-60^{\circ }\), followed by a change of \(\psi _\mathrm{{ref}}\) at a fixed \(\phi _\mathrm{{ref}}\) value. The calculations resulted in multiple simulations where the values of the reference dihedral angles were separated by 4\(^{\circ }\). Each of these simulations was 2 ns long corresponding to a total simulation time of 16.2 \(\upmu \)s.

The potential energy for a residue was calculated for each combination of the angles \(\phi \) and \(\psi \) as the time-averaged energy of the central residue in the \(\hbox {A}_{5}\) peptide. The contribution of the constraining potentials was excluded from the potential energy calculations.

3 Results

3.1 Conformational dynamics in the \(\hbox {A}_{11}\) peptide

Backbone dihedral angles \(\phi \) and \(\psi \) are the internal degrees of freedom that permit separating different conformations of the polypeptide [29, 30]. Figure 2A illustrates the probability density distribution of the dihedral angles recorded for the simulated \(\hbox {A}_{11}\) peptide. All residues inside the peptide, except the terminal ones, experience similar dynamics and have therefore contributed equally to the presented probability distribution. The distribution qualitatively resembles the distribution of dihedral angles found in experimental measurements of protein conformations [29, 30] by means of position and shape of the four most dominant backbone conformations. Regions with probability densities above \(10^{-5}\text { degrees}^{-2}\) correspond to typical conformations that are adopted by the residues of proteins [31, 43]. These regions are used for the definition of the four important domains in the (\(\phi \)-\(\psi \))-plane. The \(\alpha _R\) region embraces the right-handed \(\alpha \)-helices, the 3\(_{10}\)-helices and the \(\pi \)-helices [29]. The \(\beta \) region includes the \(\beta \)-strands and the P\(_{\mathrm{II}}\)-spirals [29]. The \(\alpha _L\) region is associated with the left-handed \(\alpha \)-helices [29], while the remaining P region describes the left-handed version of the P\(_{\mathrm{II}}\)-spirals [29]. Certain combinations of the \(\phi \) and \(\psi \) angles are sterically forbidden for a polypeptide resulting in the white areas shown in Fig. 2A.

Fig. 2
figure 2

Possible conformations of alanine polypeptides. A Probability distribution of the backbone dihedral angles computed for the \(\hbox {A}_{11}\) peptide in solution. The white area indicates the sterically forbidden regions of conformations that could not be realised. B Definition of the different dihedral conformations in the \((\phi ,\psi )\)-plane according to the regions with probability densities above \(10^{-5}\,\text {degrees}^{-2}\) shown in A. Coloured contours encapsulate the dihedral angles associated with different peptide conformations (see text)

The studied \(\hbox {A}_{11}\) peptide was found in different conformations, which are characterised by the different backbone dihedral angles as defined in Fig. 2B.

Fig. 3
figure 3

Examples for the four dominating backbone conformations of alanine polypeptides. Hydrogen bonds between backbone atoms and between the peptide and some exemplary water molecules are drawn with dashed lines, while their lengths are given in Ångström. The coordinates of the structures are taken from the potential energy surface calculations. The conformations are sorted as: A \(\alpha _R\) conformation, B \(\beta \) conformation, C \(\alpha _L\) conformation and D P conformation

Figure 3 illustrates peptides associated with the four backbone conformations that are defined in Fig. 2 including the peptide–peptide hydrogen bonds for the two helical conformations. Here it can be seen that the two \(\alpha \) conformations (Fig. 3A, C) can be recognised by the peptide–peptide hydrogen bond, whereas as shown in Fig. 3B, D, the P and \(\beta \)-conformations do not have any peptide–peptide hydrogen bonds due to the stretched nature of these conformations. Figure 4A illustrates the probabilities for the residues in the peptide to occupy these different conformations. The figure shows that the most probable conformation is the \(\beta \)-conformation with an average probability computed for all residues (except the terminal ones) being 0.6 at any temperature within the studied range. The other occurrence probabilities in decreasing order are the \(\alpha _R\), \(\alpha _L\) and P conformations. Remarkably, the probabilities for the residues in a peptide to occupy the other conformations also do not feature any pronounced temperature dependence in the temperature range of 300–350 K. Fluctuations in the occurrence probabilities of \(\beta \) and \(\alpha _R\) conformations show a certain correlation because residues in the \(\hbox {A}_{11}\) peptide most often switch between these two conformations (see Fig. 4B). Figure 4A shows that a small residual probability of about 0.1 is observed, indicating that some residues in the peptide are found without any definite secondary structure (denoted as “other” in Fig. 4A). Table 1 compares the relative occupation of these states found in this work with results for \(\hbox {A}_{2}\) peptides. Here, the relative occupation of the helical conformations \(\alpha _R\) and \(\alpha _L\) is significantly larger than for a \(\hbox {A}_{2}\) peptide because the latter is too short to form stable helical structures.

Figure 4B shows the rate of conformational changes such as the \(\alpha _R \rightleftarrows \beta \) transitions. The rate for the \(\alpha _R \rightleftarrows \beta \) transitions is defined as the sum of the rates computed for the \(\alpha _R \rightarrow \beta \) and \(\beta \rightarrow \alpha _R\) transitions. All transition rates were found to be, with negligible deviation, in a detailed balance, as expected for the sufficiently sampled peptide at the equilibrium.

Fig. 4
figure 4

Statistics of \(\hbox {A}_{11}\) polypeptide dihedral conformations and their influence on the adjacent residues. A Average occupation of different conformations by a residue in the \(\hbox {A}_{11}\) peptide averaged over time at specific temperature. A dihedral angle outside any of the regions defined in Fig. 2B is referred to as “other”. B Average rate of conformational transitions as a function of temperature. The rate of conformational transitions was computed for all residues and was averaged over the simulation time span at a specific temperature. C Conditional probability of a residue to have the \(\alpha _R\) conformation once a certain number of adjacent residues also has the \(\alpha _R\) conformation. The lines differentiate the direction in which the neighbouring residues are considered: the N-direction counts towards the N-terminal of the peptide and the C-direction counts towards the C-terminal of the peptide. D Similar to C but for the \(\beta \) conformation

The transition rates in Fig. 4B show that the overall peptide is most often found in a coil-like state in which the residues explore the \((\phi ,\psi )\)-plane. This coil-like behaviour is reflected in a frequent change of conformations of residues in the peptide. Summing all conformational change rates reveals the temperature-dependent timescale of the conformational changes. At 300 K, a conformational change of some residue is observed about twice per nanosecond and at 350 K; it turns out to be about 10 changes per nanosecond. The transitions \(\alpha _R \rightleftarrows \beta \) occur most frequently and are temperature-dependent. The \(\beta \rightleftarrows \alpha _L\) transitions or the \(\alpha _L \rightleftarrows P\) transitions occur about once per nanosecond, while the transitions between the \(\alpha _R \rightleftarrows \alpha _L\) conformations or between \(\alpha _R \rightleftarrows P\) conformations are rather rare. Note that about 80% of the residues in the \(\hbox {A}_{11}\) peptide are found in the left half of the \((\phi ,\psi )\)-plane, i.e. have \(\phi <0\). Therefore, conformational changes most likely occur clockwise in the \((\phi ,\psi )\)-plane, involving one of the \(\alpha _R\rightarrow \beta \rightarrow \alpha _L\rightarrow P\) transitions, or anti-clockwise, involving one of the \(P\rightarrow \alpha _L\rightarrow \beta \rightarrow \alpha _R\) transitions. Thus, the seemingly unordered dynamics of a solvated \(\hbox {A}_{11}\) peptide follows certain rules and is in fact not entirely random. Table 2 lists the expected transition rates between the conformations. A comparison with studies on \(\hbox {A}_{2}\) peptides in Table 3 reveals that the \(\hbox {A}_{11}\) peptide considered here performs torsional transitions at least one order of magnitude slower than the \(\hbox {A}_{2}\) peptide. Furthermore, the fact that transition rates reported in the earlier studies [44,45,46] are not in detailed balance indicates that their respective transition rates were not converged. Thompson et al. experimentally observed helix propagation rates on the order of \(h=0.1\) ns\(^{-1}\) [47], which is 1/5 of the \(k(\beta \rightarrow \alpha _R)\) rate computed here and thus fulfils the requirement \(h<\sum _x k(x\rightarrow \alpha _R)\), where x denotes different conformations. Needless to say, this inequality must hold since a helix can only propagate as fast as a residue can change its backbone dihedral conformation but not all changes will lead to stable helix propagation. Previously, inconsistencies of experimental helix propagation rates [47,48,49,50] to computational rates [47, 51, 52] were found since the latter were at least two orders of magnitude faster, which is not the case here.

Table 1 Relative population of different states from this work and from other works

Figure 4C, D illustrates the correlation of residues in the \(\hbox {A}_{11}\) peptide in terms of conditional probabilities of observing a residue in the \(\alpha _R\) or \(\beta \) conformation depending on the conformation of the neighbouring residues. Figure 4C shows the probability for a residue to have the \(\alpha _R\) conformation once the adjacent residues are also found in the \(\alpha _R\) conformation. Excluding the terminal residues in the peptide, each datapoint corresponds to residues with a different number of neighbouring residues in a certain conformation. Residues 3–9 were used to evaluate the conditional probabilities where a residue had 0 or 1 adjacent residues in a certain conformation, whereas for 4 adjacent residues only the middle residue 6 was used, due to the necessity of having sufficient neighbours.

Table 2 Transition rates k between different states from this work given in units of ns\(^{-1}\) for a temperature of 300 K
Table 3 Transition rates k between different states from this and other works given in units of ns\(^{-1}\)

For an arbitrary residue in the \(\hbox {A}_{11}\) peptide, the blue dotted line and the orange squares in Fig. 4C reveal that there is a probability of about 0.2 that the residue is found in the \(\alpha _R\) conformation once either the preceding or the following residue has a different conformation. The probability for a residue to be in the \(\alpha _R\) conformation increases to 0.4 once one neighbouring residue is also found in the \(\alpha _R\) conformation, and the probability becomes 0.6 once two preceding or following residues are found in the \(\alpha _R\) conformation. The yellow line in Fig. 4C shows that the probability of a residue to be in the \(\alpha _R\) conformation is at 0.2 once neither of its neighbouring residues are found in the \(\alpha _R\) conformation. However, the probability increases to 0.75 if the neighbouring residues from both sides of a residue are found in the \(\alpha _R\) conformation.

The results in Fig. 4C show the effect of the \(\alpha _R\) conformation of adjacent residues in the N-terminal direction or the C-terminal direction to a given residue. The effect converges once 2 adjacent residues have an \(\alpha _R\) conformation which indicates that at least 3 residues are necessary to initialise an \(\alpha \)-helix. As soon as 3 residues nucleate into a helix, the helical structure can extend further with an equal probability in any direction. The probability of a residue to be found in the \(\beta \) conformation (Fig. 4D) shows a relatively smaller correlation to adjacent residues in the \(\beta \) conformation and already saturates as soon as a residue has a single adjacent residue in the \(\beta \) conformation. In the case that no residues in both directions are found in the \(\beta \) conformation, the probability for the middle residue is at 0.4, which is even lower than the respective probability in only N-direction or C-direction (0.5). If both neighbours appear not to be in the \(\beta \) conformation, they are most likely to be in the \(\alpha _R\) conformation, in which case the middle residue is found in the \(\alpha _R\) with a probability of 0.8 (see Fig. 4C). The strong correlation of residues in \(\alpha _\mathrm{R}\) conformation thus decreases the probability of finding the middle residue in the \(\beta \) conformation when both of its neighbours are not in the \(\beta \) conformation.

Figure 5A–D shows distributions of the so-called dihedral difference computed for residues 3–9 of the \(\hbox {A}_{11}\) polypeptide. The dihedral difference quantifies the proximity of the backbone dihedral angles of different residues. Once two neighbouring residues appear in the same conformation, the dihedral difference is small but is expected to increase once the residues adopt different conformations. All residues in a helical fragment of the peptide have similar backbone dihedral angles (within \(\alpha _R\)), so the dihedral differences within that fragment are expected to be minor (i.e. ca. 20\(^{\circ }\)).

The dihedral difference between two sets of dihedral angles (\(\phi _i\),\(\psi _i\)) and (\(\phi _j\),\(\psi _j\)) can formally be defined as a Cartesian two-dimensional distance which accounts for the periodicity of the angles \(\phi \) and \(\psi \) as follows

$$\begin{aligned} d(i,j)&= \min _{k,l \in \{0,1\} } \nonumber \\&\quad \left( \sqrt{( \mid \phi _i - \phi _j \mid - k \cdot 360^{\circ })^2 +( \mid \psi _i - \psi _j \mid - l \cdot 360^{\circ } )^2} \right) . \end{aligned}$$
(2)

Here, \(\phi _i\) and \(\psi _i\) are the dihedral angles measured for residue i, while \(\phi _j\) and \(\psi _j\) are the dihedral angles measured for residue j. The definition of d(ij) respects the periodicity of the dihedral angles \(\phi \) and \(\psi \) and ensures that, for example, the dihedral difference between (170\(^{\circ }\), 0\(^{\circ }\)) and (\(-170^{\circ }\), 0\(^{\circ }\)) yields a value of 20\(^{\circ }\). The dihedral difference for a residue i calculated towards the N-terminal is defined as \(d(i,i-1)\), while the dihedral difference calculated towards the C-terminal is \(d(i,i+1)\) since the counting of residues in a polypeptide always goes from the N-terminal to the C-terminal direction.

Fig. 5
figure 5

Distribution of dihedral differences. A Dihedral difference \(d(i,i-1)\), Eq. (2), plotted over \(\phi \) for a residue i with respect to its neighbouring residues in the N-terminal direction. The residues \(i\in \{3,...,9\}\) from the \(\hbox {A}_{11}\) peptide were considered. MD frames, where the residue i is either in the \(\alpha _R\) (blue), \(\alpha _L\) (green) or outside any of the conformations defined in Fig. 2B (same blue as \(\alpha _R\), see text for an explanation), are included in the plot. Ellipsoids mark datapoints that correspond to non-helical neighbouring residues (orange), flexible helix-propagation (black) and rigid helix-propagation (red). B Similar to A, but showing \(d(i,i-1)\) as a function of \(\psi \). C Dihedral difference \(d(i,i+1)\) as defined in Eq. (2) plotted over \(\phi \). Otherwise similar to A. D Similar to C, but showing \(d(i,i-1)\) as a function of \(\psi \)

Figure 5A–D shows the dihedral differences calculated for residues 3–9 where the residues were found in the \(\alpha _R\) (blue), \(\alpha _L\) (green) conformations or without any particular structural motif. The datapoints for the latter scenario were plotted using the same colour (blue) as the datapoints for residues with the \(\alpha _R\) conformation to ensure that the observations about the formation of \(\alpha _R\) motifs in the peptide are a consequence of the internal dynamics of the polypeptide and not a consequence of a very particular definition of the \(\alpha _R\) conformation. Similar results for other polypeptide conformations, as well as three-dimensional plots showing the \(d(i,i\pm 1)\) dependence on \(\phi \) and \(\psi \), are presented in Figs. S1 and S2.

The data points shown in blue in Fig. 5 correspond to residues found in the \(\alpha _R\) conformation, thereby being part of a helical fragment in the \(\hbox {A}_{11}\) peptide. Helix propagation is the process where a helical fragment extends because residues neighbouring the fragment also adopt the \(\alpha _R\) conformation. During the helix propagation, the dihedral difference between the last helical residue i and the first non-helical residue decreases. This decrease can either be accompanied by fluctuations in the backbone dihedral angles of the last helical residue (flexible helix-propagation, black ellipsoids in Fig. 5) or keeping the backbone dihedral angles of the last helical residue more consistent (rigid helix-propagation, red ellipsoids in Fig. 5). For helix-propagation in the N-terminal-direction (see Fig. 5A, B), the flexible helix-propagation is more pronounced, while helix propagation in the C-terminal direction (see Fig. 5C, D) is dominated by the rigid helix-propagation. It is thus clear that there is a difference in the typical dynamics of helix propagation, although helix propagation is expected to occur with an equal probability in either direction (see Fig. 4C). The asymmetry in helix propagation is related to the asymmetry of the amino acids’ backbone. For the left-handed helices (\(\alpha _L\) conformation, green datapoints), the effect has not been observed.

The orange ellipsoids in Fig. 5A–D highlight the data points where a certain residue i has a helical conformation, while its neighbouring residue in N-terminal direction (see Fig. 5A, B) or in C-terminal direction (see Fig. 5C, D) does not. In these cases, a residue i with the \(\alpha _R\) conformation has the dihedral distance to the neighbouring residue j of \(d(i,j)>100^{\circ }\). Such a condition can be illustrated by considering circles in the \((\phi ,\psi )\)-plane centered somewhere at the \(\alpha _R\) region and having a radius of 100\(^{\circ }\) (see Fig. S3). When \((\phi _i,\psi _i)\) is the center of such a circle, then \((\phi _j,\psi _j)\) lying outside of the circle is logically equivalent to \(d(i,j) > 100^{\circ }\). Thus, the datapoints which are encapsulated by the orange ellipsoid lie outside any of these circles.

A suitable description of the overall conformation of a peptide can be achieved through defining its helical content. The simulated \(\hbox {A}_{11}\) peptide was initialised as an ideal \(\alpha \)-helix. The initial helical structure becomes distorted after around 170 ns at 300 K (420 ns if equilibration is included) and, over the course of the performed MD simulation, multiple folding and unfolding events were observed (see Fig. 6A). The performed analysis illustrates that the \(\hbox {A}_{11}\) peptide does not form long-lived stable helices in the studied temperature range. With increasing temperature, helical fragments occurred more often but also broke apart on a shorter timescale. This behaviour corresponds to an increased flexibility of the peptide, which can be quantified through the increase in its conformational changes (see Fig. 4B). Figure 6B–D shows the statistics of helical stability, namely the probability of observing a certain number of helical residues (see Fig. 6B), the probabilities for changes in the length of the helical fragment over one picosecond (see Fig. 6C) and the expected lifetime of helical fragments (see Fig. 6D). Figure 6E shows the average potential energies of helical fragments of certain length. The analysis in Fig. 6B–E included the simulation results obtained at temperatures 302–350 K, so the initial helical structure which persisted through a part of the production simulation at 300 K was not considered. Helical fragments most frequently are formed from 3 residues (see Fig. 6E), although such constructs turn out to be very short-living, having a probability of 0.5 to break apart after 1 ps (see Fig. 6C) and a very short lifetime even up to the 95th percentile (see Fig. 6D). Similarly, helical fragments consisting of 4 residues were also found to be rather unstable. With at least 5 residues in a helical formation, the number of helical residues in the peptide had a reasonable chance to persist (see Fig. 6C), and remarkably the \(\hbox {A}_{11}\) peptide with 5 helical residues does not further stabilise when the helical content is further increased.

Fig. 6
figure 6

Statistics on the number of helical residues in the \(\hbox {A}_{11}\) peptide. A The number of helical residues in the \(\hbox {A}_{11}\) peptide, plotted over temperature and simulation time. The temperature was increased stepwise from 300 K to 350 K every 250 ns, so the plot shows a single simulation where the temperature was increased. B Probability distribution of the number of helical residues in the \(\hbox {A}_{11}\) peptide. The probability not to contain any helical residues, P\(_{0}\), is indicated. Since at least 3 residues are necessary to form a helical structure, it is not possible to find only 1 or 2 helical residues, i.e. P\(_{1}\) = P\(_{2}\) = 0. C Probability of a peptide with a certain number of helical residues to increase, persist or decrease in the helical content within the next 1 ps. D Lifetime of a certain number of helical residues in the peptide. The average value is plotted in black, while the 40–95 percentiles are shown with colour. E Average interaction energies within the peptide (blue, solid line), between the peptide and the solvent (orange, dotted line) and the sum of these interactions (yellow, dashed line) as a function of the helical content of the peptide

Configurations with longer helices occur less frequently compared to those having 3 helical residues since there are less ways to realise a longer helix in the \(\hbox {A}_{11}\) peptide. Peptides with 6 helical residues turn out to be significantly more stable than fragments with 3 or 4 helical residues (see Fig. 6C, D), which is reflected in the probability to find helical fragments with 6 helical residues (see Fig. 6B) The interaction energies in Fig. 6E show another peculiarity of intermediately sized helices. A peptide with 6 helical residues typically has stronger interactions within the peptide and weaker interaction with the solvent. This fact indicates that an \(\hbox {A}_{11}\) peptide with 6 helical residues is found in a conformation in which the contact with the solvent is avoided, an effect that can be attributed to the hydrophobicity of alanine and a possible proximity of the oppositely charged terminal residues.

The sum of interaction energies in Fig. 6E shows that there is an energetic advantage for the peptide to form the helical structures. There is, however, no advantage to form longer helices, indicating that the \(\hbox {A}_{11}\) peptide is mostly observed in the random coil-like state already at 300 K.

3.2 Statistical mechanics model

Steered MD simulations of the \(\hbox {A}_{5}\) peptide were performed in explicit water to calculate the potential energy surface of a residue in the polypeptide. The approach used here computes the energy surface of the middle residue in the variable space in which all five residues adopt the same backbone dihedral angles. It is therefore assumed that the secondary structures of interest (energy minima on the surface) show similar backbone dihedral angles for all the residues involved and that these structures are locally stabilised by interactions with residues not further apart than two residues. An \(\hbox {A}_{5}\) peptide was chosen because two adjacent helical residues in each direction are sufficient for the convergence of the conditional probabilities to be helical (see Fig. 4C) and of helix stability (see Fig. 6C). The resulting potential energy surface of the central residue is shown in Fig. 7A, which describes the potential energy of a residue inside a solvated alanine peptide computed as a function of the twisting dihedral angles \(\phi \) and \(\psi \). The potential energy surface was separated into the contributions describing a bound alanine, as illustrated in Figs. S4 and S5. The energy landscape in Fig. 7A corresponds to the probability density of the \(\hbox {A}_{11}\) peptide conformations shown in Fig. 2A, observed for the unconstrained MD simulations with explicit water. The agreement shows that the approach for calculating the potential energy surface is reasonable and permits to establish the potential energy of a residue with respect to its twisting degrees of freedom. The result in Fig. 7A shows that a potential energy of not more than 6 kcal/mol is sufficient for a residue to freely switch between the \(\alpha _R\) and \(\beta \) conformation, while a potential energy of 8 kcal/mol is needed for the \(\beta \rightleftarrows \alpha _L\) transition and a potential energy of 10 kcal/mol is necessary for the \(\alpha _L\rightleftarrows P\) transition. It is therefore natural that the frequency of the conformational changes \(\alpha _R\rightleftarrows \beta \) in Fig. 4B is the largest, followed by the \(\beta \rightleftarrows \alpha _L\) transitions and finally by the \(\alpha _L\rightleftarrows P\) transitions. Other transitions are suppressed due to energy barriers of 12 kcal/mol and above. The potential energy barriers here are higher by a factor of 2–3 compared to previous results reported for implicitly solvated \(\hbox {A}_{2}\) [46]. In the present investigation, the larger change in potential energy is due to the simultaneous steered movement of multiple residues.

Fig. 7
figure 7

Potential energy surface of the middle residue inside the solvated \(\hbox {A}_{5}\) peptide calculated using steered MD and its statistical model interpretation. A Average potential energy surface of the middle residue inside the solvated \(\hbox {A}_{5}\) peptide computed as a function of the two twisting dihedral angles \(\phi \) and \(\psi \) with the potential energies counted relatively to the global energy minimum. B Potential energy surface describing the bound state of the residue inside a peptide, where energies are calculated with respect to the local minimum in the \(\alpha _R\) conformation as defined in Fig. 2. The highest energy of the bound state is characterised by the hydrogen bond energy \(E_\mathrm{{HB}}\). C Potential energy surface of the unbound state of a residue inside a peptide. The lowest energy of the unbound state matches the hydrogen bond energy \(E_\mathrm{{HB}}\). D Helicity of peptides of different lengths. Helicity was calculated within the framework of the statistical model using the potential energy surfaces shown in B and C, which represent the bound and the unbound states, respectively. The numbers indicate the length of the peptides. E Comparison of the temperature dependence of the helicity calculated using the statistical model (lines). The statistical model is compared to the experimentally measured helicity of an alanine-rich peptide consisting of 21 amino acids (blue triangles) [47] and to the average helicity obtained from the MD simulations of the \(\hbox {A}_{11}\) peptide (red circles)

The folding \(\leftrightarrow \) unfolding dynamics of the polypeptide can be described through a two-state statistical mechanics model using the potential energy surface shown in Fig. 7A [26, 27]. The potential energy surface needs to be separated into the potential energy surfaces of the bound state corresponding to an \(\alpha _R\) helix conformation and the unbound state corresponding to the random coil state, where residues constantly switch conformations. For such a description, the minimum of the \(\alpha _R\) conformation should first be set to 0 kcal/mol. The energy necessary to break a single hydrogen bond \(E_\mathrm{{HB}}\) can be used as a threshold value to separate the bound and unbound conformations; the procedure is illustrated in Fig. S6. The potential energy surface, within a threshold value \(E_\mathrm{{HB}}\) counted from the \(\alpha _R\) minimum, therefore describes the bound state (see Fig. 7B), while the potential energy greater than \(E_\mathrm{{HB}}\) is representing the unbound state (see Fig. 7C). The statistical mechanics model is sensitive to the choice of \(E_\mathrm{{HB}}\). This stems mainly from the helix initiation factor \(\beta \) defined as [26]:

$$\begin{aligned} \beta = \exp \left( -\frac{3E_\mathrm{{HB}}}{k_\mathrm{{B}} T}\right) , \end{aligned}$$
(3)

where \(k_\mathrm{{B}}\) is the Boltzmann factor and T is the temperature of the system. Hydrogen bond energies in solvated peptides in simulation and experiment are expected to be in the range of 0.5–1.5 kcal/mol and are significantly weaker than the hydrogen bonds in vacuum [53, 54]. A characteristic value of the hydrogen bond energy of \(E_\mathrm{{HB}}=1.65\,{\text {kcal/mol}}\) was used here, corresponding to the value computed in explicit water using the CHARMM force field [53]. The partition function for a residue of the peptide that has one of the two possible conformations can be constructed as:

$$\begin{aligned}&Z_b = \iint _{K_{E_\mathrm{{HB}}}} \exp { \left( -\frac{\epsilon _b (\phi ,\psi )}{k_\mathrm{{B}} T} \right) } \hbox {d}\phi \hbox {d}\psi , \end{aligned}$$
(4)
$$\begin{aligned}&Z_u = \int _{-180}^{180} \int _{-180}^{180} \exp { \left( -\frac{\epsilon _u (\phi ,\psi )}{k_\mathrm{{B}} T} \right) } \hbox {d}\phi \hbox {d}\psi . \end{aligned}$$
(5)

The partition function of a residue in the bound state, \(Z_b\), follows Boltzmann statistics and includes the potential energy of the bound state \(\epsilon _b\) shown in Fig. 7B. The partition function is built from an integral over \(K_{E_\mathrm{{HB}}}\) which is the area accessible from the minimum in \(\alpha _R\) limited by the maximal energy \(E_\mathrm{{HB}}\). In Fig. 7B, the area \(K_{E_\mathrm{{HB}}}\) is coloured. For the partition function of the unbound state, \(Z_u\), the integration is performed over the whole span of the dihedral angles \(\phi \) and \(\psi \), considering the energies \(\epsilon _u\) shown in Fig. 7C. The partition functions \(Z_b\) and \(Z_u\) of a residue permit constructing the partition function of the peptide of length n. Employing the approximation of a singular helical fragment, the partition function of a peptide can be constructed for the purpose of calculating the expectation value of helicity. The theory developed by Yakubovich et al. [26] was adopted here to consider 3 residues necessary for helix nucleation and to allow a fully helical peptide. The partition function can then be written as:

$$\begin{aligned} {\mathbb {Z}} = Z_u^n + \beta \sum _{k=3}^{n} (n-k+1) Z_b^k Z_u^{n-k}. \end{aligned}$$
(6)

Here, \(\beta \), \(Z_b\) and \(Z_u\) are defined in Eqs. (3)–(5). The first term, \(Z_u^n\), accounts for the possibility of a peptide to have all residues in the unbound state. The summation is performed over k, the number of residues in the helical fragment, which is required to be at least 3 to be recognised as a helix and can maximally be n, i.e. the number of amino acids in the peptide. The term \((n-k+1)\) counts the number of ways to fit a helix of length k inside a peptide of length n. Using the partition function in Eq. (6), it is possible to derive the expectation value for the fraction of helical residues \(f_H\) in a peptide as:

$$\begin{aligned} f_H = \frac{1}{{\mathbb {Z}}} \beta \sum _{k=3}^n \frac{k}{n} (n-k+1) Z_b^{k} Z_u^{n-k}. \end{aligned}$$
(7)

The fraction k/n defines the helical fraction of k residues in a peptide of length n. Other terms are the same as defined in Eq. (6). The Zimm–Bragg [55] parameters can be inferred from this model with the helix-propensity s given as \(s=Z_b/Z_u\) and the helix nucleation parameter \(\sigma =\beta s^{t}\) where t depends on the number of residues necessary to initiate a helix [27]. Note that thermodynamical characteristics are calculated with a different approximation for the partition function [26].

Equation (7) is used to compute the expected helicity for peptides of different lengths (see Fig. 7D). The helicity drops with increasing temperature due to the thermal fluctuations in the system. For peptides containing 8–21, 35 and 99 amino acids, the temperature dependence of the helicity was plotted explicitly, while other peptide lengths are indicated by a colour gradient. For small peptides, the helix-initiation factor \(\beta \) dominates the resulting helicity and for longer peptides the difference in entropy between \(Z_b\) and \(Z_u\) becomes the most relevant contribution which causes a steep slope. Longer peptides are expected to have a higher helical fraction which drops at latest around 350 K. Helicity-dependencies for longer peptides become closely packed; the addition of another residue causes only a minor change in the behaviour of the whole peptide.

The helicity calculated within the two-state statistical model for an \(\hbox {A}_{21}\) peptide was compared with experimental measurements in Fig. 7E, showing a reasonable agreement. Between 270 and 300 K, the statistical model overestimates helicity compared to experiment [47] probably due to the higher helix propensity of alanine (A) compared to arginine (R) [56]. The latter was present in the alanine-rich peptide used in the experiment. It is also possible that the present model needs to be extended such that it also includes the \(\beta \) conformations of the amino acids in a similar way to the \(\alpha _R\) conformation studied here. These additional states in the partition function would not contribute to the helicity and would reduce the helical fraction overall. A study on four slightly different 24 residues long peptides containing 16 alanines found helical fractions of \(f_\mathrm{{H}}=\{0.61, 0.61, 0.54, 0.53\}\) at 277 K [57], which is close to the prediction of the statistical model for an \(\hbox {A}_{16}\) peptide of \(f_\mathrm{{H}}=0.6\) but significantly lower than the prediction for the \(\hbox {A}_{24}\) peptide of \(f_\mathrm{{H}}=0.9\).

Replica Exchange MD performed on an \(\hbox {A}_{21}\) peptide resulted in an approximately linear dependence of helicity on temperature and \(f_\mathrm{{H}}=0.5\) at 350 K, but the deviations can probably be attributed to the parameterisation of the \(\phi \) and \(\psi \) angles [58]. Comparison of the helicity values for the \(\hbox {A}_{11}\) peptide with the helicity data gathered from MD shows that the results of MD simulations yield helicity values that fluctuate around the value obtained from the statistical model. The comparison illustrates that in order to gather consistent results about the helicity for the \(\hbox {A}_{11}\) peptide, simulations on the order of at least few microseconds are necessary for each temperature of interest. The necessary simulation time for sufficient sampling is expected to increase further for longer peptides due to the larger conformational space.

The potential energy surface of a residue utilised in the presented two-state statistical model employing steered MD of the \(\hbox {A}_{5}\) peptide only includes interactions with the solvent and its two neighbouring residues in either N- and C-terminal directions. The correlation of neighbouring residues in the \(\alpha _R\) conformation (see Fig. 4) influences the next two adjacent residues making this approximation reasonable. For polypeptides composed of different amino acids, such an approach could, however, yield insufficient results. Interactions of longer side-chains would make it necessary to simulate longer peptides in order to calculate the correct potential energy surfaces.

Comparison of helicity calculated with the use of the statistical model with experimental measurements or simulations of long polypeptides demands further caution. For sufficiently long polypeptides, tertiary structures would emerge, which are not included in the two-state model. Longer peptides will also include several helical fragments, which would alter the statistics and the results. Multiple helical states are not considered in the present study because of the following reasons. Firstly, any additional helical fragment, which can be considered independent from the other one, should be at least 3 residues apart. The first peptide in which two helical fragments might occur is therefore the \(\hbox {A}_{9}\) peptide, having the secondary structure 3\(\times \)Helix - 3\(\times \)Coil  - 3\(\times \)Helix. This represents only a single state with two helical fragments compared to \(\sum _{k=3}^{9} (9-k+1)=28\) states with a single helical motif, following Eq. (6). Secondly, the boundary amino acids of helical residues are less flexible, which suppresses the weight of multiple helical fragments due to the loss in entropy [27]. Thirdly, the helix-initiation factor needs to be accounted for at least twice when a peptide has more than one helical fragment, which also reduces the statistical weight of such helix-composite structures. To conclude, the presented theory in single-helix approximation should be sufficient for peptide lengths in which tertiary structures are not very dominant.

4 Conclusion

The dynamics of a solvated alanine polypeptide was studied with the use of atomistic MD simulations and a statistical approach. It was shown that the backbone dihedral angles of the residues in the polypeptide chain cluster in conformations, which are commonly associated with certain secondary structure motifs in proteins. The dynamics of the backbone dihedral angles was characterised by conformational transitions where the \(\alpha _R\rightleftarrows \beta \) transitions turned out to be the dominating ones, while the \(\beta \rightleftarrows \alpha _L\) transitions and \(\alpha _L\rightleftarrows P\) transitions occurred less frequently. The results of the simulations revealed that prolongation of helical fragments in the direction of the N-terminal or C-terminal differs with respect to the statistical occupation of different folding pathways. Furthermore, helix-propagation in the C-terminal direction was described as rigid helix propagation which might extrapolate to more complex peptides as well and thus provide an explanation why proteinogenenis from the N-terminal to the C-terminal could be advantageous [59].

Once folded, helical fragments showed a short lifetime on the order of ca. 1-20 picoseconds. The investigation revealed that the helix stability increased significantly once the helical fragment becomes extended from 3 to 5 residues. Given the assumption that during the folding process secondary structures form first which then form tertiary structures, 5 or less residues are expected to be insufficiently stable and cannot support the formation of tertiary structures. This indicates a possible identification of those parts of a protein which nucleate the folding process: only helical fragments longer than 5 residues could be considered as nucleation candidates for the formation of tertiary structures.

The four conformations that encompass 90% of the probability density in the \((\phi ,\psi )\)-plane coincide with local minima in the potential energies computed here. These minima are separated by energy barriers of less than 4 kcal/mol. The potential energy surfaces for alanine residues were used to construct a partition function that permitted to calculate the fractional helicity of the peptide. The predictions of the statistical model revealed that most residues in short peptides, with up to 15 amino acids in length, are expected to be in a flexible coil-like conformation at a temperature of 300 K and that alanine polypeptides of an arbitrary length are not able to hold a stable \(\alpha \)-helix conformation at temperatures about ca. 350 K. Persistent helices beyond that temperature are possible, however, these helices are expected to be stabilised by the formation of tertiary structures.