Performance of CHARMM36m with modiﬁed water model in simulating intrinsically disordered proteins: a case study

Molecular dynamics simulations can be a powerful tool to complement experiments in the study of the structures and dynamics of intrinsically disordered proteins. Though the accuracy of the physics-based all-atom force ﬁelds has improved signiﬁcantly in simulating structured proteins over the past twenty years, most of these force ﬁelds face a big challenge to simulate ﬂexible proteins. Recently, CHARMM36m with modiﬁed TIP3P model was proposed as a possible solution to simulate intrinsically disordered proteins. Here, we tested the proposed solution using an extensively studied protein, namely NCBD, to explore the performance of CHARMM36m plus modiﬁed TIP3P water. Our results suggest that the modiﬁed TIP3P water model does enhance the sampling of conformational space compared to the standard TIP3P water model. However, the new CHARMM36m force ﬁeld still leads to over-compact structures and over-stabilized helices.


INTRODUCTION
In traditional structural biology, a globular protein has a single stable tertiary structure. The discovery of intrinsically disordered proteins (IDPs), which do not have a unique stable structure under physiological conditions, is challenging the traditional structural biological paradigm (Click et al. 2010;Dunker et al. 2008;Dyson and Wright 2005;Tompa 2002;Wright and Dyson 1999). One unique property of IDP sequences is that the primary sequence of an IDP is enriched with polar and charged amino acids, along with decreased amounts of non-polar residues. Such decreases of nonpolar residues have limited the capability of IDPs to form hydrophobic cores, which are the key contributors leading to stable structures in structured proteins (Dunker et al. 2001;Huang and MacKerell 2018). This distinct sequence composition enables IDP's ability to switch between or sample different tertiary structural states. Their constant structural fluctuation allows a single IDP to perform a multitude of biological functions (Uversky et al. 2005), such as roles in cellular signaling (Smock and Gierasch 2009) and regulation (Fuxreiter et al. 2008;Babu et al. 2011). IDPs have also been associated with several pathological conditions, including cancer (Iakoucheva et al. 2002;Metallo 2010;Uversky et al. 2008) and neurodegenerative diseases (Uversky et al. 2014).
Experimental techniques, such as small-angle X-ray scattering (SAXS), nuclear magnetic resonance (NMR) and Forster resonance energy transfer (FRET) spectroscopy, are often being used to study IDPs (Eliezer 2009;Sapienza and Lee 2010;Yengo and Berger 2010). The major challenges are that the heterogeneous ensembles of IDPs and their rapid inter-conversion between conformations make it difficult to obtain detailed structural information solely from experiments. Over the past years, molecular dynamics (MD) simulations with physics-based force fields have become more refined to simulate protein folding and interactions. Molecular mechanics studies protein dynamics by employing a potential energy function (force fields) that includes bonded and non-bonded interactions, a set of parameters, and the interactions between the protein and water, to computationally describe a protein system. The accuracy of the classic atomic force fields has kept improving over the past 20 years. Molecular dynamics simulations could lead to the high level of detail which is critical to studying the conformational ensembles of an IDP and the dynamics behind its strongly fluctuating structure (Zhang et al. 2012) and can be a powerful tool to accompany experimental techniques to address the above challenges in experiments.
Molecular dynamics simulations also face some challenges, mainly inadequate conformational sampling and force field accuracy. The first challenge can be compensated by extending the timescale of simulations and employing techniques such as replica-exchange molecular dynamics (REMD). In contrast, the second challenge is because of the approximation of nature, meaning that atomistic simulations can only be approximately correct and are limited to our understanding of the system we are trying to model. Additionally, force fields developed for one type of system (structured proteins) may not be fully transferable to other systems (small peptides, IDPs) (Knott and Best 2012). Moreover, IDPs require force field developers to improve the balance of the secondary structure propensity as well as the balance between proteinwater and protein-protein interactions. These requirements are especially important for IDPs giving their lack of a stable hydrophobic core. Regarding the secondary structure preference, various force fields corrected their backbone and side-chain parameters, leading to the development of Amber ff99SB, ff03* (Hornak et al. 2006), ff99SB-ILDN (Hornak et al. 2016), ff99IDPs (Wang et al. 2014), as well as CHARMM36 (Huang and Mackerell 2013). Subsequently, Amber ff03ws (Best et al. 2014) and the recently developed CHARMM36m (Huang et al. 2017) are targeted to address the proteinwater interactions and other issues identified. Studies have tested the performance of these forcefields (Palazzesi et al. 2014;Su et al. 2019;Ye et al. 2017), identifying an improvement in the modeling of IDPs.
The nuclear coactivator binding domain (NCBD) is an extensively studied IDP by both experiments and computations because of its high residual structure. NCBD forms complexes with several proteins, displaying a different tertiary structure in each binding configuration. NCBD is characterized as three helices organized in a helix-bundle topology with several, low intensity, longrange couplings between helices H1 and H2, and some between H2 and H3 (Naganathan and Orozco 2011). NCBD has become one of the few experimentally validated examples of structural flexibility (Demarest et al. 2004;Oldfield et al. 2008) and it has two NMR structures of the unbound state determined as of today (Kjaergaard et al. 2010;Lin et al. 2001). Because of the wide array of experimental data available on NCBD, it is a prime candidate to work with computationally. NCBD has been studied with numerous force fields including several iterations of CHARMM (Papaleo et al. 2018;Zhang et al. 2012) and Amber (Burger et al. 2012;Knott and Best 2012;Naganathan and Orozco 2011). Nonetheless, these studies have revealed issues related to force field accuracy, including over-compaction and over-stabilization of the protein.
In this paper, we have investigated the performance of the latest CHARMM36m force field as well as the effects of different TIP3P water models, on their ability to accurately model and sample the conformational ensemble of IDPs, using NCBD as a model protein.
Specifically, we compared the average radius of gyration obtained here to those reported by previous computational and experimental observations, and found a potential preference to compact structures/overstabilization of the protein in the force field. Additionally, secondary structure preference and conformational space sampling were evaluated to find a potential overemphasis on secondary structures. Finally, the modified TIP3P water model was found to be able to enhance the sampling of a larger conformational space when compared to the standard water model.

Preference to compact structures
As mentioned in the Introduction section, a major problem when modeling IDPs through physics-based atomistic models is overly compact ensembles (Best et al. 2014;Henriques et al. 2015;Piana et al. 2014;Rauscher et al. 2015). Previous simulations of NCDB with different force fields faced this problem (Burger et al. 2012;Knott and Best 2012;Papaleo et al. 2018). The modified TIP3P water model intends to fix this problem by increasing the dispersion interactions between the protein and water (Huang et al. 2017). In the timescale studied, the modified water system sampled more ''open'' conformations when compared to the standard TIP3P water (Fig. 1).
However, when calculating the average R g for each system, the difference is reasonably small (1.21 nm for modified vs 1.2 nm for standard). Details of the average value and the standard deviation of each trajectory are shown in Table 1. It also results much more compact when compared to that obtained by previous simulations (1.37-1.49 nm) (Knott and Best 2012;Papaleo et al. 2018) and of course lower to the values estimated experimentally under ''native-like'' conditions (1.52 nm) (Kjaergaard et al. 2010). This could be explained by the fact that previous works employed different techniques to improve their samplings, such as REMD and experimental restraints (Knott and Best 2012;Papaleo et al. 2018). Additionally, it is important to mention that only one Lennard-Jones well depth (e H ) value was tested and it was previously stated that no universal e H applies to all IDP systems, so it might be necessary to decrease or increase this value to get the desired effect (Huang et al. 2017).

Over-stabilization of helical structures
Another problem that is common when modeling IDPs is the over-stabilization and preference bias to secondary structures, in particular, helices. To evaluate this effect in CHARMM36m, the secondary structure (specifically the helicity) was calculated for each system, using the define secondary structure of proteins (DSSP) algorithm (Kabsch and Sander 1983). Both systems (Fig. 2) presented three regions of high helical propensity that correspond in sequence location to the helixes presented in unbound NCBD (Kjaergaard et al. 2010). However, previous works on NCBD rarely report helicity above 0.8 (or 80%). This indicates possibly over-stabilization of the helices, which was also observed for NCBD with previous force fields (Naganathan and Orozco 2011; Zhang et al. 2012) even at high temperatures. Additionally, some of the previous studies (Knott and Best 2012;Zhang et al. 2012) found a bimodal behavior in the region corresponding to Helix 2 (residues 23-35), which was not observed here.
Contact maps (Fig. 3) were also constructed. Two residues were considered to be in contact if the distance between two heavy atoms of these residues, which must be more than four residues apart in the sequence, was less than 0.55 nm. Based on this definition, the intrahelical contacts were identified (i, i ? 4) presenting a high probability of contact (diagonal), which correlates with the high helicity found using DSSP. Perpendicular to these are the inter-helical contacts, which were also similar for both systems, with minor differences in the probability. The other tertiary contacts appear to be distinctive or at least with a different probability Fig. 1 The radius of gyration for the trajectories of each system. Notice that the standard water system rarely samples conformations with an R g above 1.6 nm, whereas the modified system can sample up to 2.1 nm between the two systems. Specifically, there were some contacts between the C-terminal and the first helix, as well as with the N-terminal. This could be an indication of a compact structure which is in agreement with the low radius of gyration obtained. These contacts were presented with a slightly higher probability in the standard water system than in the modified ones. Finally, both systems seem to have similar results, presenting the most difference in the C-terminal region of Helix 3. This is because of the low percentage of interactions between the C-terminal of the protein and the rest of NCBD as shown in Fig. 3.

Modified TIP3P samples larger conformational space
Next, the conformational space sampled by each system was studied. First, protein structures of both systems were clustered with a cutoff of 0.25 nm. This clustering was done by using the core (no N or C-terminal) for the least-squares fit and RMSD calculations. By excluding the terminal movements (which can be significant in terms of RMSD values), conformational changes in helical packing can be identified. With this cutoff, the simulations with the modified TIP3P water generated about twice more clusters than the ones using the standard TIP3P water. Additionally, the modified system, in general, had clusters that represented smaller amounts of conformations (low percentage). These could be indications that the modified system sampled a more heterogeneous free energy space. The top five clusters (Fig. 4), which on average represent *65% of the structures, consisted of folded structures (high residual structure) and some appear to be more compact than the initial structure. All the top structures present three highly structured helixes, as expected from DSSP analysis. However, there are some changes in the packing of the helixes, which can be seen clearly in Clusters 2 and 4 of the modified TIP3P water systems and to a less extent in Clusters 4 and 5 of the standard TIP3P systems.  Having studied representative structures from each system, free energy landscapes based on these trajectories were generated to examine the sampling capabilities of molecular dynamics simulations using different water models. Given that the modifications in the water model were implemented to try to replicate chain dimensions and the ability to model an intrinsically disordered protein is being tested, the order parameters chosen were the radius of gyration (R g ) and the fraction of native contacts (Q). As can be seen in Fig. 5, the modified TIP3P water system samples a larger conformational space. The standard water system is mostly limited by its inability to sample higher radius of gyration, causing repeated sampling of the same space, many times with R g values lower than the initial structure (*1.45 nm). While the modified system also has its minimums at low R g values, its sampling is more evenly spread with no significant energy barriers. Additionally, it is also clear that the modified TIP3P system samples more conformations with Q \ 0.5, which although may not imply the sampling of unfolded structures, does indicate the sampling of structurally different conformations compared to the initial/NMR structure.

DISCUSSION
The purpose of this study is to evaluate the proposed solution of using the latest CHARMM36m with a modified TIP3P water model to simulate IDPs as well as determine whether the proposed modification in the protein-water interactions is enough to improve chain dimensions in MD simulations. The modification of the water model allows NCBD to sample more open conformations (and larger conformational space), based on the larger value of the calculated average radius of gyration. It is clear that, for the tested e value and the timescale reached, such modification is not enough to replicate the chain dimensions obtained experimentally.
Higher e values may need to be tested, although the physical validation of this is uncertain. Additionally, it may be necessary to perform REMD simulations, as limitations set by the starting structure and general problems with convergence may contribute to these results. As far as the CHARMM36m force field, with or without the water model modification, there seems to be some weakness in the force field. To the extent of this study, there is a potential overemphasis on secondary structure, over-stabilization of the protein in general, and the possible underestimation of protein-water interactions.

METHOD
All-atom simulation details NCBD (59 residues) was simulated using two water models: the standard TIP3P and the modified TIP3P water as described by Huang et al. (2017). The only difference between these two water models is that the parameter describing e H between the water hydrogen atoms is changed from -0.046 kcal/mol in the standard CHARMM TIP3P water model to -0.1 kcal/mol in the modified CHARMM TIP3P water model (Huang et al. 2017). The structure for NCBD was obtained from the ligand-free state solution NMR structure (PDB ID: 2KKJ) (Kjaergaard et al. 2010) as shown in Fig. 6. All simulations were carried out using the CHARMM36m force field with explicit solvents and the Groningen Machine for Chemical Simulations (GROMACS) package (version 2018.3) Berendsen et al. 1995;Pall et al. 2015). The protein was placed in a cubic box with the corresponding water model and counter ions (Cl -) to neutralize the whole system at 304 K. Longrange electrostatics is calculated using the particlemesh Ewald (PME) algorithm (Darden et al. 1993;Essmann et al. 1995). Periodic boundary conditions were applied in all directions. Each system of protein, water, and counter ions was prepared using CHARMM-GUI (Jo et al. 2008;Lee et al. 2016), which generates a series of GROMACS inputs for subsequent MD simulations. To generate equilibrated starting structures for the MD simulations, steepest-descent minimization was carried out, followed by a 1-ns MD equilibrium simulation with a time step of 1 fs, to heat the whole system from 1 K to the desired temperature. All bonds with hydrogen atoms are converted to constraints with the algorithm LINear Constraint Solver (LINCS) (Hess et al. 1997), using the default parameters of the GROMACS package. The equilibrated structures obtained from the above steps were used for subsequent production runs. A Nose-Hoover temperature thermostat (Nose 1984;Hoover 1985) was used to maintain the temperature. The time step was 2 fs, and snapshots were taken every 100 ps. For both the standard TIP3P and the modified TIP3P water model systems, a cubic water box size of 10 nm was employed and run for a total of 20 ls, including two 10-ls long MD trajectories.
Acknowledgements This work was supported by the Research Allocations Committee (RAC) Award and Substance Use Disorders Grand Challenge Pilot Research Award at the University of New Mexico, the startup fund from the University of New Mexico.

Compliance with Ethical Standards
Conflict of interest Laura I. Gil Pineda, Laurie N. Milko, and Yi He declare that they have no conflicts of interest.
Human and animal rights and informed consent This article does not contain any studies with human or animal subjects performed by any of the authors.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/.