Even with current computer power and improved molecular simulation codes, obtaining adequate sampling remains an issue in atomistic simulations in explicit solvent. This is particularly true for IDPs, with their high conformational heterogeneity. Several methods have been developed to overcome this sampling issue and these methods can be divided into two groups: unbiased (e.g., temperature replica exchange, solute tempering) and biased (e.g., umbrella sampling, metadynamics) enhanced sampling simulations. The ensemble of conformations generated with these methods is typically analyzed by clustering the data based on structural similarities or calculating the probability of observables. Here we will discuss the strengths and weaknesses of several replica exchange and metadynamics-based implementations, which have been used extensively to characterize ordered proteins, when applied to IDPs. The application of another enhanced sampling method, multicanonical ensemble MD, to study IDPs has recently been reviewed elsewhere (Ikebe et al. 2016).
Replica exchange
In a temperature replica exchange MD (tREMD) simulation (Sugita and Okamoto 1999), multiple copies (replicas) of the system are simulated in parallel, all at different temperatures (see Fig. 2). At least one of the replicas should be at the temperature of interest and at least one should be at a temperature high enough to rapidly overcome free-energy barriers between metastable states. The high-temperature replica allows for fast sampling of conformational space and, as exchanges between replicas adjacent in temperature take place, the conformational ensemble at the temperature of interest can be built. The exchanges are governed by a Monte Carlo scheme and detailed balance is ensured by applying the Metropolis rule:
$$P(acc) = \min(1, \exp(\delta \beta \delta E)), $$
where d
ß is the difference in inverse temperature and d
E is the difference in potential energy between the replicas.
The tREMD method is routinely available in popular biomolecular simulation packages, including NAMD (Phillips et al. 2005), AMBER (Salomon-Ferrer et al. 2013), and GROMACS (Abraham et al. 2015). tREMD simulations are particularly useful as an explorative tool to efficiently sample the free-energy landscape and as such have seen much use in characterizing the conformational ensemble of IDPs. So far, tREMD simulations have primarily been used to characterize the conformational ensemble of IDPs in the unbound state.
One of the first applications of tREMD to study IDPs highlighted the importance of transient formation of secondary structure in the A ß40 and A ß42 peptides (Sgourakis et al. 2007). These peptides both form amyloid fibrils found in plaques in the brains of patients with Alzheimer’s disease. A ß42 is far more aggregation-prone and neurotoxic than A ß40. The extensive sampling provided by the tREMD simulations shows that the conformational ensemble of A ß42 is much more diverse and, contrary to A ß40, transient secondary structure formationis largely found in the C-terminus of the peptide, emphasizing the importance of the two extra C-terminal residues. Similarly, transient formation of secondary structure elements has also been demonstrated for the highly flexible N- or C-terminal protrusions of histone proteins, where states with high secondary structure content may be important for interaction with linker DNA (Potoyan and Papoian 2011). In this latter study, 50–54 replicas, depending on system size, spanning a temperature range of 300-450 K, were simulated for 55–60 ns/replica.
As many IDPs are structured when bound to their interaction partners, the observation of transient secondary structure formation raises the question of whether the binding partner selects and stabilizes a conformation transiently sampled by the IDP in isolation. Support for this mechanism of binding came from extensive tREMD simulations of the NCBD protein in implicit (Zhang et al. 2012) and explicit (Knott and Best 2012) solvent. This nuclear coactivator binding domain of the transcriptional coactivater CBP is disordered in solution but folds into a triple helix when bound to its binding partner ACTR. The more accurate explicit solvent simulations used 48 replicas to span a temperature range of 304–424 K with a simulation time of 250 ns/replica. In agreement with experimental data, a large proportion of residual helical structure was found in the unbound ensemble, particularly in the two terminal helices. The middle helix is observed less frequently and not in conjunction with the terminal two helices, pointing to a mechanism that is part conformational selection and part induced folding upon interaction with ACTR.
However, the residual structure observed in tREMD simulations does not always coincide with the structure observed in the bound conformation. Miller et al. studied the conformational ensemble of several IAPP variants using 40 replicas to span a temperature range of 300–575 K with a simulation time of 200 ns/replica (Miller et al. 2013). The human IAPP peptide (hIAPP) forms ß-sheet-rich amyloid fibrils but in solution tREMD simulations, a relatively high a-helical content was noted. Notably, the degree of residual helical structure observed for variants of the peptide correlated with the aggregation propensity, with less aggregation-prone variants having more structural flexibility (Miller et al. 2013).
The flexible nature of IDPs makes them highly amenable to regulation via post-translational modifications, including phosphorylation and glycosylation (Babu et al. 2011; Xie et al. 2007). tREMD simulations have been used to investigate how such modifications affect the conformational ensemble of IDPs. Zerze and Mittal studied the effect of O-linked glycosylation on the conformational ensemble of the tau 174-183 fragment and the hIAPP peptide (Zerze et al. 2015). For the tau peptide, 24 replicas in the temperature range of 300–545 K were simulated for 100 ns/replica. For the larger hIAPP peptide, a total of 40 replicas were used to cover the 300–500 K temperature range (150 ns/replica). This study found only a mild effect of glycosylation on the conformational ensemble of the tau 174-183 fragment and the hIAPP peptide (Zerze et al. 2015). In contrast, phosphorylation of Ser133 of the KID peptide leads to a significant redistribution of helical substates and is likely to affect recognition of its binding partner KIX (Ganguly et al. 2009). This implicit solvent simulation study employed 12 replicas in the range of 270–500 K with a simulation time of 200 ns/replica.
Although tREMD has proven very useful in analyzing protein conformational space, and there are many variants of tREMD as discussed in a recent review (Ostermeir and Zacharias 2013), a major limitation is its poor scaling with system size as the number of replicas needed increases as \(\mathcal {O}(f^{1/2})\), where f is the system’s total number of degrees of freedom. The reason for this poor scaling can be understood directly from the probability of accepting exchanges \(\exp (\delta \beta \delta E)\) between adjacent replicas. As systems get larger, this rule dictates smaller spacing in temperature in order to ensure viable acceptance probabilities and thus more replicas are needed to cover the same temperature range. Several methods have been developed to overcome this limitation. Here, we will briefly discuss Hamiltonian replica exchange (hREMD) (Fukunishi et al. 2002) and solute tempering (Liu et al. 2005; Wang et al. 2011).
Hamiltonian replica exchange
Although tREMD is the most commonly used implementation, other replica coordinates can be used to modify the underlying energy surface. In principle, any coordinate can be used as long as detailed balance is obeyed by employing the metropolis acceptance criterion. The replica coordinate can be coupled to the Hamiltonian, or force field, of the system in a scheme that is generally referred to as Hamiltonian replica exchange MD (hREMD) (Fukunishi et al. 2002). Suitable replica coordinates facilitate backbone structural transitions by scaling the strength of, for instance, hydrogen bonds or hydrophobic interactions. The scaling is done such that the interactions are weaker for consecutive replicas, such that refolding transitions are fast at the most downscaled replica. Typically, fewer replicas are needed in hREMD than in tREMD to obtain similar conformational sampling.
Solute tempering
Another method developed to overcome poor scaling with system size in conventional tREMD simulation is the replica exchange with solute tempering (REST) method (Liu et al. 2005). In a REST simulation, the system is divided into two parts, with one acting as a bath and remaining at the temperature of interest and the other part (usually the whole protein, although part of the protein, or the protein and solvation shell waters, could be used instead) is effectively heated up. As only energy differences arising from protein and protein–water interaction but not water–water interactions, contribute to the acceptance probability, the number of replicas needed to cover a certain temperature range is significantly reduced. However, REST does not perform well for large systems involving sizable conformational changes (Huang et al. 2007).
Several groups have independently combined the core concept of REST, dividing the system into a cold and a hot part, with the idea of scaling Hamiltonians of hREMD (Moors et al. 2011; Terakawa et al. 2011; Wang et al. 2011). For the hot part of the system, the electrostatic, Lennard–Jones and proper dihedral terms (the force-field parameters contributing to energy barriers) are scaled such that the interactions inside this part are kept at an effective temperature of T/?. Interactions within the cold part are kept at temperature T and interactions between the cold and hot parts are kept at an intermediate temperature \(T/\sqrt {\lambda }\). This method, often referred to as REST2, has been implemented in GROMACS (Terakawa et al. 2011; Bussi 2014).
The improved efficiency of REST2 makes this an attractive tool for studying the conformational ensemble sampled by IDPs. A recent application of this method to the disordered N-terminal fragment of the nacre protein allowed a reduction of almost a factor of six in the number of replicas needed to span the required temperature range compared to conventional tREMD (Brown et al. 2014). The n16 nacre protein is a framework protein associated with biogenic mineral stabilization in the Japanese pearl oyster. Its 30 residue N-terminus (n16N) is essential for the stabilization of the mineral component in nacre and is largely disordered. REST2 simulations of n16N in its apo- and Ca 2+-complexed forms, using 16 replicas to span a temperature range of 300–500 K, support the hypothesis that the peptide can be divided into three subdomains. The N-terminal subdomain and the central amyloid-like domain (SD1 and SD2) feature stabilization through intrapeptide aromatic contacts. The C-terminal subdomain (SD3) has a higher charge density and shows more structural flexibility. This domain is likely to play a crucial role in capturing Ca 2+, whereas SD1 and SD2 are essential for the formation of interpeptide contacts and hence multipeptide complexes (Brown et al. 2014). REST2 was also used to characterize the conformational space of Helicobacter pylori UreG (Musiani et al. 2013), a class of intrinsically disordered enzymes involved in the maturation of the urease virulence factor in bacterial pathogens. The same protocol was applied to HypB, a protein from Methanocaldococcus jannaschii that is closely related in sequence and function but has not been classed as an IDP. A total of 24 replicas were needed to span a temperature range of 300 to 450 K, compared to an estimated 100 replicas if conventional tREMD had been used. The authors found that the regions involved in catalysis show substantial structural rigidity. In contrast to HypB, the regions in UreG that are involved in interaction with metallochaperones to form multiprotein complexes are more unfolded (Musiani et al. 2013).
It should be noted that although the hREMD method is powerful in enhancing backbone transitions (Ostermeir and Zacharias 2013), to the best of our knowledge, non-REST2 flavors of hREMD have not yet been successfully applied to characterize IDPs. We do, however, believe that these will prove useful in future work in this field, particularly in analyzing binding and unbinding effects of IDPs and binding partners as protein–protein interactions can be used as replica coordinates.
Metadynamics
Enhanced sampling methods that employ a biasing potential are often considerably cheaper in terms of computer time than replica exchange-based approaches. Out of the biased enhanced sampling methods, metadynamics-based approaches appear the most suitable for exploring the conformational ensembles sampled by IDPs. Similar to, for instance, conformational flooding (Grubmüller 1995) and local elevation methods (Huber et al. 1994), in a metadynamics simulation the system is discouraged from visiting previously explored regions by a biasing potential (Laio and Parrinello 2002). This history-dependent biasing potential is built by periodically depositing Gaussians along the trajectory of the collective variable (CV) (see Fig. 3)
$$ V_{G}(s(r), t) = w \sum\limits_{t^{\prime} =\tau_{G}, 2\tau_{G}, ... t^{\prime}<t} \exp \frac{(s(r)-s(t^{\prime}))^{2}}{2{\sigma_{s}^{2}}}, $$
(1)
where s(r) is the CV as a function of the atomic coordinates, w and s are the height and width of the Gaussians and t
G
is the rate at which they are deposited. Ultimately, when the simulation reaches equilibrium, the biasing potential should exactly compensate the unbiased free-energy profile along the chosen CV. Assuming a perfect choice of CV, the accuracy of the method depends wholly on the height and width of the Gaussians and the frequency at which these are deposited.
A major advantage of metadynamics over methods such as umbrella sampling (Torrie and Valleau 1977) and steered MD (Park and Schulten 2003) is that no a priori knowledge of the end states is required and that multiple CVs can be used. However, the efficiency of metadynamics scales poorly with the number of CVs used and in practice is limited to three CVs. Moreover, in practical applications, the free-energy profile does not converge to a definite value but rather fluctuates around the correct result. There is also a significant risk of pushing the system into physically irrelevant regions of phase space. Hence, several adaptations to the scheme have been proposed. Here, we will discuss the popular well-tempered (Barducci et al. 2008; Bonomi et al. 2009) and bias-exchange methods (Piana et al. 2007). All of these versionsof metadynamics are available in the PLUMED plug-in (Tribello et al. 2014), which interfaces with many popular simulation packages.
Well-tempered metadynamics
Well-tempered metadynamics (WTM) was developed to address issues of poor convergence and risk of sampling outside the physically relevant phase space in standard metadynamics simulations. In this method, the height of the Gaussians is not fixed but scaled, ensuring dampening of the biasing potential towards the exact result in the limit of long simulations.
WTM has been applied to characterize binding of the IDPs PTMA and NRF2 to the Kelch domain of Keap1 (Do et al. 2015). Interaction with NRF2 is crucial for the regulation of cellular responses to oxidative stress. Although there is a high degree of sequence similarity between the Kelch binding domains of PTMA and NRF2, the affinity of PTMA is approximately 100 times weaker. Two 3 µs well-tempered metadynamics simulations using three CVs were run with either PTMA or NRF2 and the Kelch domain. Multiple binding and unbinding events were observed for both systems. NRF2’s higher affinity for the Kelch domain may be explained by the observation that PTMA is much more disordered than NRF2 (Do et al. 2015). In its unbound state, NRF2 has a tendency to form short hairpin structures, supporting the hypothesis of coupled folding and binding for the NRF2-Kelch complex.
Bias-exchange metadynamics
Bias-exchange metadynamics combines the strengths of metadynamics and replica exchange (Piana et al. 2007). In this method, several metadynamics simulations of the system are run in parallel, all being biased in independent CVs. It is crucial here to include one ’neutral’ replica, which does not experience a biasing potential. Exchanges between the systems are attempted periodically in a replica exchange-type fashion.
This method has successfully been applied to investigate the mechanisms by which the oncoprotein c-Myc is inhibited by a small drug molecule (Michel and Cuchillo 2012). Seven biased replicas (with different CVs) and one neutral replica were simulated for 120 ns (per replica) to generate the unbound (Apo) ensemble. It should be noted that the CVs used here were general, rather than optimized specifically for this system. One more replica was added for the bound (Holo) state biased in a CV accounting for ligand interaction. The authors show that the ligand-binding domain of c-Myc can bind the ligand in multiple distinct conformations. Interestingly, many of these conformations are also wholly or partially present in the unbound ensemble, providing support for a conformational selection mechanism (Michel and Cuchillo 2012).
Comparing sampling efficiencies of metadynamics-based approaches
The sampling efficiencies of unbiased MD, bias-exchange metadynamics and well-tempered metadynamics simulations have recently been compared for a 20-residue disordered peptide derived from the Neh2 domain of the Nuclear factor erythroid 2-related factor 2 (Nrf2) protein (Do et al. 2014). The authors compared conformational ensembles obtained from 3 µs unbiased MD simulation, 3 µs well-tempered metadynamics simulation using two CVs and a 460-ns bias exchange metadynamics simulation with eight replicas (seven biased and one neutral) and validated their results against X-ray crystallography and NMR spectroscopy data. Although both metadynamics protocols significantly enhance sampling, the bias-exchange scheme proved far more effective than well-tempered metadynamics (Do et al. 2014). General CVs, like ß-sheet content and number of hydrogen bonds, were used and uniformly applied to all Neh2 residues. As such, no prior knowledge of experimentally observed structures is necessary and they do not bias towards a limited set of predefined structures.
Similarly, the sampling efficiency of temperature replica exchange MD and bias-exchange metadynamics simulations have been assessed for the IDP hIAPP (Zerze et al. 2015). Forty replicas, spanning the 300–575 K temperature interval, were used for the tREMD simulation, with a simulation time of 200 ns/replica (8 µs cumulative simulation time). The bias-exchange metadynamics simulation employed seven biased replicas and a neutral replica with a simulation time per replica of 650 ns (5.2 µs cumulative simulation time). The free-energy profiles and secondary structures populated in the two ensembles obtained with the two approaches are very similar. However, bias-exchange metadynamics explores larger regions of conformational space with less (cumulative) simulation time, suggesting that this method is computationally more efficient. The authors do note that bias-exchange metadynamics simulations are less straightforward to set up as the choice of CVs needs to be validated (Zerze et al. 2015).