Application of the random coil index to studying protein flexibility
- First Online:
- Cite this article as:
- Berjanskii, M.V. & Wishart, D.S. J Biomol NMR (2008) 40: 31. doi:10.1007/s10858-007-9208-0
- 419 Views
Protein flexibility lies at the heart of many protein–ligand binding events and enzymatic activities. However, the experimental measurement of protein motions is often difficult, tedious and error-prone. As a result, there is a considerable interest in developing simpler and faster ways of quantifying protein flexibility. Recently, we described a method, called Random Coil Index (RCI), which appears to be able to quantitatively estimate model-free order parameters and flexibility in protein structural ensembles using only backbone chemical shifts. Because of its potential utility, we have undertaken a more detailed investigation of the RCI method in an attempt to ascertain its underlying principles, its general utility, its sensitivity to chemical shift errors, its sensitivity to data completeness, its applicability to other proteins, and its general strengths and weaknesses. Overall, we find that the RCI method is very robust and that it represents a useful addition to traditional methods of studying protein flexibility. We have implemented many of the findings and refinements reported here into a web server that allows facile, automated predictions of model-free order parameters, MD RMSF and NMR RMSD values directly from backbone 1H, 13C and 15N chemical shift assignments. The server is available at http://wishart.biology.ualberta.ca/rci.
KeywordsNMRChemical shiftProteinFlexibilityOrder parameters
Over the past 30 years, NMR spectroscopy has emerged as one of the most useful tools to investigate protein flexibility due to its remarkable ability to provide site-specific information about protein motions over a large range of time scales. In particular, residue-specific measurements of hydrogen exchange (spanning periods of seconds to hours), conformational exchange (μs to ms) and order parameters (ps to ns) can be performed via NMR (Kay 1998). These methods have proven to be very robust and are now routinely used in most NMR studies of protein dynamics (Ishima and Torchia 2000). However, all of these approaches typically require conducting additional non-trivial NMR experiments followed by extensive and complex data analysis (Lacroix et al. 1997; Palmer 2001). This makes many NMR-derived dynamic measurements a relatively tedious and error-prone process. Recently, we described a simple approach for measuring protein flexibility that doesn’t require any more information than backbone chemical shift assignments. The method, called the Random Coil Index (RCI), is based on an empirically derived relationship between secondary chemical shifts and protein mobility (Berjanskii and Wishart 2005). Most importantly, the RCI can be quantitatively related to other standard measures of protein motions such as model-free order parameters and per-residue root mean square fluctuations (RMSF) calculated from molecular dynamics simulations and per-residue root mean square deviations (RMSD) calculated from NMR ensembles.
The original RCI paper, which was published as a brief communication, presented the concept as an empirically driven hypothesis with little formal justification or rationale. At the time, relatively few details could be provided about time-scales, limitations, advantages and minimal data requirements for this method. Since then, we have undertaken a far more detailed study of the RCI concept and have developed a more complete understanding of how chemical shifts can (and cannot) be used to monitor protein flexibility. In the work presented here, we (1) explain how the RCI was derived and why it works; (2) demonstrate the efficacy and accuracy of the RCI on a much larger set of proteins; (3) assess the sensitivity of the RCI method to chemical shift referencing errors, random coil reference values and sequential corrections; (4) establish the minimal data requirement for reliable RCI predictions; (5) ascertain the time-scale of RCI-detected motions; (6) identify the theoretical limitations of the method, and (7) show the advantages of the RCI method over existing methods used to study protein dynamics. We believe that a detailed investigation of these issues is critical to establish the utility and legitimacy of the RCI concept within the NMR community. Furthermore, in an effort to encourage greater use of the RCI method in routine protein analysis, we created a freely accessible RCI web server that allows facile, automated predictions of model-free order parameters, MD RMSF and NMR RMSD values directly from backbone 1H, 13C and 15N chemical shift assignments (available at http://www.wishart.biology.ualberta.ca/rci).
RCI derivation and principles
The connection between random coil chemical shifts and protein flexibility has been known for several decades (Grathwoh and Wuthrich 1974; Bundi et al. 1975; Wishart et al. 1991; Wishart and Sykes 1994a, b). Indeed, random coil chemical shifts can be formally defined as shifts that result from a fast conformational exchange among energy-weighted populations of all theoretically possible conformations of an unfolded polypeptide chain in the absence of long-range inter-residue interactions (Bundi and Wuthrich 1979; Vila et al. 2002). As the structure and mobility of a protein segment approaches the random coil state, the chemical shifts of its atoms tend to move toward their corresponding random coil values. In fact, the proximity of amino acid chemical shifts to their random coil values has been used by many research groups to qualitatively estimate the level of protein structural disorder (Lecroisey et al. 1997; Chou et al. 2002; Fiaux et al. 2002). In addition to using chemical shifts to approximately assess protein flexibility, chemical shifts can also be used to identify regions displaying order or segmental rigidity. Over the past two decades, several approaches have been developed to identify rigid secondary structure elements from secondary chemical shifts and these methods are now routinely applied to the analysis of protein structure (Dalgarno et al. 1983; Pardi et al. 1983; Wishart et al. 1992; Wishart and Sykes 1994a, b; Wang and Jardetzky 2002a, b; Eghbalnia et al. 2005).
Our initial attempts to overcome this problem indicated that, by combining the chemical shifts from multiple nuclei (Cα, Cβ, CO, N, NH, and Hα) into a single parameter, one is able to decrease the level of unwanted “random coil shift noise” (Fig. 1g). The improved performance originates from the different probabilities of random coil chemical shifts from different nuclei being found among amino acid residues in flexible regions versus rigid regions. Typically, residues in rigid helices or rigid β-strands are less likely to have more than one random coil chemical shift among their backbone and Cβ shifts than residues in mobile regions (Fig. 1h). After testing a variety of mathematical expressions, it was determined that the simplest expression for combining chemical shifts in a way that correlated well with motional amplitudes was the inverse of the averaged, weighted secondary backbone shifts. We named this parameter the Random Coil Index (or RCI) since it quantitatively tracks the relative degree to which a protein segment matches the random coil state.
The “end-effect correction” can also be applied at this point. The last step of the protocol involves smoothing the initial set of RCI values by three-point averaging (Berjanskii and Wishart 2006).
Optimization and validation of RCI with a larger data set
Originally, the RCI method was trained on a set of just 14 proteins [Berjanskii 2005, (1529]. Because of concerns that this data set may have been too small and that it incompletely sampled the full dynamic or structural range found in proteins, we enlarged the original training set of proteins to 28 well-resolved proteins with complete 1H, 13C and 15N backbone assignment to ascertain the robustness and extensibility of the RCI method (Supplemental Table 2). The new training set consisted of 2988 residues, spanning a range of sizes (46–203 residues) and including all major protein fold classes (all α, all β, mixed α/β). This set also contained proteins with widely varying flexibility content (a flexible residue is defined as one with and order parameter less than 0.85) covering between 15% and 65% of the protein length (Supplemental Table 3). To obtain detailed, residue-specific information on the backbone mobility of these proteins, we calculated a set of 4-ns MD trajectories for each protein using GROMACS 3.2.1 (Lindahl et al. 2001) and GROMOS96 43a1 force field (Scott et al. 1999). Details of the MD simulation protocol have been published elsewhere (Berjanskii and Wishart 2005). The full set of MD simulations conducted for the study reported here required more than 8,000 CPU hours in total on a cluster of 840 3.0 GHz Xeon processors (WestGrid supercomputer, Canada). Residue-specific amide nitrogen RMSFs were calculated for each protein trajectory with GROMACS. These RMSF values served as a proxy measure of that protein’s backbone mobility. Weighting coefficients were determined by optimizing the correlation between the calculated RCI values and the MD RMSF using a simple grid search.
Assessing minimal data requirements and the influence of random chemical shift reference values and nearest neighbor corrections
Chemical shift data from peptides and protein is often “noisy”, with missing assignments or problems from improper chemical shift referencing. There are also a large number of methods for correcting or calibrating chemical shifts and comparing them to different sets of random coil (or reference) shifts. Some of these are known to work well, while others do not. We undertook a number of comparative studies to assess how these chemical shift effects affect the RCI accuracy and to optimize the performance of the RCI method so that it would be more robust to this chemical shift “noise”.
Note that the RCI method was originally developed to deal only with proteins having a full set of 6 types of backbone chemical shift assignments (Cα, CO, Cβ, N, NH and Hα However, not all proteins are routinely assigned to such an extent. Consequently, we decided to investigate the level to which the RCI method could perform in situations where assignments of some of these nuclei are missing. The effect of excluding one or more types of nuclei was evaluated by calculating the correlation between the RCI (generated without a particular set of shifts) and its corresponding MD RMSF for all 28-test proteins. For each of the 63 different chemical shift combinations, we optimized the RCI weighting coefficients and calculated the mean correlation coefficients using a simple grid search strategy.
To assess the influence of the choice of random coil reference shifts and nearest neighbor corrections on the accuracy of RCI-predicted protein flexibility, we tested eight combinations of four sets of random coil shift values (Wishart et al. 1995; Lukin et al. 1997; Schwarzinger et al. 2000; Wang and Jardetzky 2002a, b) and two sets of i ± 1 neighboring residue corrections (Schwarzinger et al. 2001; Wang and Jardetzky 2002a, b). In all cases, i ± 2 neighboring correction values published by Schwarzinger et al. (2001) were used. Note that the reference chemical shifts published by Lukin et al. (1997) lack proton chemical shifts and were supplemented with the Hα and NH chemical shifts generated by Wang and Jardetzky (2002a, b). Weighting coefficients in the RCI equation (for each combination of random coil values and nearest neighbor correction factors) were optimized against the correlation coefficient calculated between the RCI and MD RMSF values using 33 proteins (Supplemental Table 2).
The RCI web server
Results from the performance optimizations, reference corrections, missing data compensation, automatic renormalization, and nearest neighbor corrections described in the preceding pages were implemented into both a stand-alone program and a publicly available web server, called the RCI web server. The stand-alone program, which generates results identical to the server, was used to generate the data shown in the following pages. The RCI web server, which is designed for general access and single protein queries, accepts both SHIFTY and BMRB NMR-STAR formatted chemical shift files as input and provides both text files and graphical plots of the RCI values, predicted MD RMSF, NMR RMSD and order parameters as output. Additionally, the simple-to-use interface allows users to select the set of random coil reference values, the type of nearest-neighbor residue corrections, the method of chemical shift reference correction, the treatment of end-effects and the treatment of assignment gaps. The backend-end for the RCI web server is written in Python, while the graphical user interface is coded in Python and HTML. The server is available at http://www.wishart.biology.ualberta.ca/rci.
Results and discussion
RCI performance and minimal data requirements
A leave-one-out strategy was employed to test the performance of the RCI algorithm (and RCI server) in predicting protein flexibility. In particular, when the RCI values were calculated for each protein in the test set, the chemical shifts for that protein were excluded from the test set. This avoids problems of over-fitting and prevents any bias in the results. The average correlation coefficient between the RCI and MD RMSF was 0.81 (identical to that obtained using the whole data set). To ensure that the good correlation was not a result of over-fitting, another 5 proteins, not included in the grid search and spanning a range of sizes from 106 to 286 residues (830 residues in total), were analyzed (Supplemental Table 2, italics). The average correlation between RCI and the MD RMSF values of these proteins was also 0.81. Note that nearly identical results (r = 0.82) were obtained with the smaller set of 14 proteins used in the original RCI paper. This suggests that the RCI method is robust and extensible to any other protein.
The minimal data requirements for optimal performance of the RCI program (and server) were assessed for each of the 63 possible chemical shift combinations. The results, which are too extensive to present here, are available in Supplemental Table 1. This table includes weighting coefficients and calculated the mean correlation coefficients for each chemical shift combination. Nevertheless, some interesting trends are worth noting. In particular, the omission of Cβ, NH or N shifts from the RCI calculation had a minimal effect on the correlation between the per-residue RCI and MD RMSF values. In fact, the correlation coefficient dropped by only 0.04. On the other hand, including only Cβ, NH and N shifts in the RCI calculations decreased the average correlation coefficient by 0.17. This result is consistent with the smaller weighting coefficients for Cβ, NH and N shifts seen in the RCI expression (Eq. 2, Supplemental Table 1). It is also consistent with the smaller correlation coefficient between MD RMSF values and single-atom RCI values for these nuclei (Supplemental Table 1).
The low correlation between protein flexibility and the RCI derived from NH or N shifts could be partially explained by the high sensitivity of NH and N shifts to long-range shielding/de-shielding determinants, such as ring currents and local charges; see (Szilagyi 1995) for a review. Furthermore, the sensitivity of NH and N shifts to experimental conditions, such as temperature, pH, ionic strength and certain buffer components (Szilagyi 1995), may also be responsible for contributing to the disagreement between their secondary chemical shifts and calculated protein flexibility.
The poor correlation between Cβ secondary chemical shifts and backbone dynamics can potentially be explained by the sensitivity of Cβ shifts to side-chain mobility. For instance, calculations of Glu chemical shifts using density functional theory demonstrated that fluctuations of the χ3 angle can result in significant changes to the Cβ chemical shift (4 ppm) while having only a modest effect (<1 ppm) on Cα, CO, and Hα shifts (Xu and Case 2002). The same study revealed that nitrogen chemical shifts, which also correlate poorly with backbone mobility, can vary as much as 2 ppm due to χ3 fluctuations. Given that side chain mobility is often completely uncoupled from backbone motions (Wand 2001), one would expect nuclei that are particularly sensitive to side chain motions to be poorer predictors of backbone motions. Weighting coefficients optimized for each possible combination of assigned nuclei have been integrated into the latest version of the RCI web server. These weighting coefficients were also scaled for each chemical shift combination in order to make the RCI value (Eq. 1) independent from the number of chemical shifts used in the calculation. To achieve this, weighting coefficients for every combination of chemical shifts were uniformly scaled to make the average weighting coefficient equal to the average of weighting coefficients for the six-shift RCI (i.e., when Cα, CO, Cβ, N, NH and Hαare used As a result, users will be able to obtain comparable RCI values independent of the number of chemical shift assignments used. However, we recommend using, at least, Cα, Hα, and CO shifts to obtain accurate predictions from the RCI server.
Influence of random coil shift values and neighboring residue corrections
It was shown recently that the outcome of secondary structure identification by chemical shifts may depend critically on the choice of random coil reference values (Mielke and Krishnan 2004). We investigated how the choice of random coil reference shifts and nearest neighbor corrections affects the accuracy of RCI-predicted protein flexibility. Several different sets of statistically derived (Wishart et al. 1991; Wishart and Sykes 1994a, b; Lukin et al. 1997; Wang and Jardetzky 2002a, b; Wang and Jardetzky 2002a, b) and experimentally derived (Richarz and Wuthrich 1978; Bundi and Wuthrich 1979; Glushka et al. 1989; Braun et al. 1994; Thanabal et al. 1994; Merutka et al. 1995; Wishart et al. 1995; Schwarzinger et al. 2000) random coil shifts have been published over the past two decades. The experimentally derived sets differ from each other due to a number of experimental design issues, including nearest neighbor effects, the choice of chemical shift referencing methods, the influence of end effects and experimental conditions (temperature, pH, buffer composition and ionic strength). Statistically derived sets of random coil chemical shifts differ from each other primarily due to the size of their reference databases (currently ranging from 36 to 415 proteins), from which they were derived. To date, only one of the statistical sets (Wang and Jardetzky 2002a, b) includes any form of nearest-neighbor residue correction, while two experimentally derived sets include neighboring residue corrections (Wishart et al. 1995; Schwarzinger et al. 2001).
As noted before, eight combinations of four sets of random coil shift values (Wishart et al. 1995; Lukin et al. 1997; Schwarzinger et al. 2000; Wang and Jardetzky 2002a, b) and two sets of i ± 1 neighboring residue corrections (Schwarzinger et al. 2001; Wang and Jardetzky 2002a, b) were used to assess the performance of the RCI program. In all cases, i ± 2 neighboring correction values published by Schwarzinger et al. (2001) were used. The results from this analysis revealed that the mean correlation coefficients vary over a very small range (between 0.79 and 0.81) for every combination of reference random coil chemical shift values and i ± 1 neighboring corrections (Supplemental Table 4). This result demonstrates that RCI-derived flexibility is relatively insensitive to the differences among published reference random chemical shifts and nearest neighbor sequential corrections. The RCI web server utilizes the random coil shifts published by Schwarzinger et al. (2001) as the default random coil values because this is currently the only set of reference shifts for which both i ± 1 and i ± 2 neighboring corrections were determined. However, the RCI web server allows a user to choose any of the eight combinations of random coil values and i ± 1 nearest-neighbor corrections.
Effect of different methods to correct mis-referenced chemical shifts
As noted earlier, secondary shifts (Δδ) depend not only on the choice of random coil chemical shifts or the inclusion of nearest neighbor corrections, but also on the accuracy of the reported chemical shifts (i.e., correct chemical shift referencing). Chemical shift referencing, particularly for 13C and 15N shifts, continues to be a problem with about 20% of newly deposited shifts being referenced in a non-standard way (Zhang et al. 2003). To ensure uniformity in chemical shift referencing for both the test and training set of proteins used in the RCI calculations, we re-referenced all 33 sets of chemical shifts using the structure-based SHIFTCOR protocol (Zhang et al. 2003). This kind of reference correction was critical in developing and refining the RCI protocol, but it also led to two important questions: (1) What are the consequences of using mis-referenced chemical shifts in an RCI calculation? (2) How can the RCI server correct chemical shift referencing problems in the absence of a 3D structure?
While chemical shift referencing errors can be corrected with SHIFTCOR, this reference correction protocol requires that the protein’s tertiary structure already be known. Since the RCI method is expected to be used prior the determination of a protein structure, a structure-independent method of reference correction needs to be found. Fortunately one exists. The PSSI re-referencing protocol (Wang and Wishart 2005) uses 1Hα shifts (which are rarely mis-referenced) to correct and calibrate mis-referenced 13C and 15N shifts. We have extended this method to perform reference shift corrections in the absence of 1Hα shifts (that are often unavailable in the published assignments of large proteins). This re-referencing protocol (called REFCOR) was implemented in the RCI web server. A comparison between the REFCOR re-referencing and SHIFTCOR reference corrections on the RCI of SV40 T-antigen DNA-binding domain is shown on Fig. 2. A more detailed description of the REFCOR protocol has been published elsewhere (Berjanskii and Wishart 2006).
RCI versus other methods to investigate protein dynamics
The Random Coil Index was originally developed and refined on data obtained from MD simulations. One may wonder how realistic these simulations are and whether the RCI results would correlate with other measures of protein mobility such as order parameters, B-factors, and NMR RMSD values. To assess these relationships, we analyzed all 33 proteins and determined the correlation of the RCI values with experimentally determined model-free (Lipari and Szabo 1982; Clore et al. 1990) order parameters (S2exp), theoretically predicted (Zhang and Bruschweiler 2002) order parameters (S2pred), amide nitrogen RMSD of NMR ensembles and B-factors from crystallographic structures.
Note that the difference between NMR RMSD (root mean square deviation) and MD RMSF (root mean square fluctuation) is that MD RMSF is a measure of in silico spatial fluctuation calculated over a period of time. MD RMSF depends on the protein model, the length of MD simulations, and the quality of the MD force-field. On the other hand, NMR RMSD is calculated from an NMR-derived structural ensemble and reflects the quality and quantity of NMR restraints, the NMR structure determination protocol and the manner of assembling the structural ensemble. Typically, per-residue RMSDs of NMR ensembles are not used to characterize protein dynamics, but rather to serve as a measure of ensemble precision or uncertainty of ensemble’s average structure. However, the local variability of NMR ensembles also depends on the number of structural restraints (e.g., NOEs, dihedral restraints) and these can be significantly reduced in number when spectral peaks are broadened or diminished in intensity due to conformational exchange. Also, the number of NOEs is typically decreased (and, as a result, the NMR RMSD is increased) in flexible regions due to reduced local atomic density. Therefore, we have decided that the sensitivity of disorder in NMR ensembles to protein dynamics justifies the usage of per-residue NMR RMSD as a reporter of protein flexibility in the current work.
Correlation among different measures of motional amplitudes in proteins
The average absolute errors for S2, MD RMSF, NMR RMSD, and B-factor predicted from RCI values are 0.05, 0.40 Å, 0.43 Å, and 16.9, respectively.
The imperfect agreement between original and RCI-predicted measures of motional amplitudes (S2, MD RMSF, and NMR RMSD) should not be viewed negatively. Indeed, no single method discussed in this paper gives a complete and faultless picture of protein dynamics. For example, NMR ensembles may be affected by numerous factors unrelated to protein dynamics. They may sample conformational space incompletely due to investigator bias and the inclusion of unrecognized spin-diffusion NOEs (yielding upper bounds that are too tightly restrained). Likewise, NMR ensembles may be corrupted or structurally biased due to the mis-assignment of ambiguous NOEs, the mis-interpretation of spectral artifacts or noise peaks as NOEs. On the other hand, NMR ensembles may also over-sample conformation space due to investigator bias (neglect or inexperience) or a lack of NOEs and other conformational restraints.
The problems with NMR RMSD as a proxy measure of protein mobility are not unique. For instance, experimental order parameters may not properly reflect amplitudes of protein motions due to the poor separation of overall tumbling and internal dynamics (Korzhnev et al. 1997). Likewise, opposing effects arising from motions on different time-scales for both transverse (Palmer et al. 2001) and longitudinal (Fushman et al. 1997) relaxation processes, site-specific variations of 15N CSA (Damberg et al. 2005) and other artifacts (Case 2002) can lead to incorrect order parameter estimates. Predicted order parameters are not without their problems either. These theoretical measures may miss long-range correlated motions (Zhang and Bruschweiler 2002) and appear to be quite sensitive to sub-angstrom inaccuracies in the 3D model, from which they are derived.
B-factors also have their faults as they can manifest not only the internal dynamic disorder of a protein but also multiple conformations in different unit cells (internal static disorder). B-factors can also be corrupted by refinement errors, the contributions of more than one atom to a particular electron density, intermolecular crystal packing contacts, lattice defects and lattice vibrations (Petsko and Ringe 1984; Carugo and Argos 1999). Likewise, MD simulations are not immune to criticism. MD RMSF can suffer from a number of problems including incomplete conformational sampling (Elofsson and Nilsson 1993) and large uncertainties of motional amplitudes in mobile regions (Horita et al. 2000). Insufficient system equilibration, numerical rounding errors, limitations of energy functions and treatments of long-range non-bonded interactions can all contribute to inaccuracies of MD-derived amplitudes of protein motions.
The inherent errors and uncertainties associated with these techniques are not the only contributors to the imperfect level of agreement. Discrepancies also arise from the different time-scales, for which these methods are most sensitive. MD simulations are computationally limited to monitoring ps to ns motions. Experimental model-free order parameters reflect the amplitude of motions on a picosecond to nanosecond time-scale (Lipari and Szabo 1982; Clore et al. 1990). Their precision and accuracy decreases when the time-scale of internal motions approaches that of overall tumbling (Jin et al. 1998; Chen et al. 2004). In contrast to experimental order parameters, B-factors can be affected by uncorrelated (Petsko and Ringe 1984) motions that may occur over much longer period of time (hours to days) during the acquisition of X-ray data. The expression for predicted order parameter (Zhang and Bruschweiler 2002) has been empirically optimized to convert protein-packing density into a model-free order parameter (ps–ns time scale). However, it has not been ruled out that the predicted order parameter can represent motions on other time scales. In fact, it has been shown that protein packing can also be related to crystallographic B-factors (Halle 2002), which are sensitive to motions over much longer time scales than the time-scale of model-free order parameters. Chemical shifts, as manifested by the Random Coil Index, are expected to be most sensitive to motions on time-scales ranging from 100’s of ps to possibly ms. This time scale appears to depend on the coalescence conditions of contributing nuclei and the heterogeneity of sampled chemical shift space (see a discussion about RCI time-scale later in the paper).
Despite their sensitivity to different time-scales, all of the previously mentioned methods, including the RCI, share the ability to identify protein “hot spots” with increased mobility. Given the ease with which RCI values can be calculated, we believe the RCI method could prove to be particularly useful in conducting quick “sanity” checks of the motional amplitudes obtained with other methods (e.g., NMR RMSD, S2, and B-factors).
In the next example, we will show how the RCI method can be used to determine whether an NMR ensemble has reached the appropriate degree of overall structural diversity. Figure 4b compares the RMSD values for two different NMR ensembles of ubiquitin. One ensemble (PDB ID: 1XQQ) has a total of 128 different conformers and was determined via a novel dynamically optimized structure generation process (Lindorff-Larsen et al. 2005). The other ensemble (PDB ID: 1D3Z) consists of just 10 conformers and was determined with additional residual dipolar coupling (RDC) restraints (Cornilescu et al. 1999). As seen in Fig. 4b, the RCI plot correlates better (r = 0.74) with the dynamically optimized NMR ensemble of ubiquitin (PDB ID: 1XQQ) than it does with the RDC-refined ensemble of ubiquitin (r = 0.60). This plot clearly shows that the difference between RMSD of the 1D3Z model and RCI-predicted RMSD is much larger than the difference between RCI-predicted RMSD and the RMSD of the dynamically optimized 1XQQ (mean difference of 0.42 Å vs. 0.11 Å for regular secondary structures). As indicated by this example, we believe that the RCI method offers a considerable promise for the detection of insufficient or exaggerated structural variations in NMR ensembles.
RCI: its applicability to large and unfolded proteins
One of the key advantages of the RCI method is that it appears to be capable of predicting protein flexibility regardless of the protein weight, shape or domain composition. In particular, we have found that RCI values correlates very well with per-residue MD RMSF for proteins ranging in size from 56 to 283 amino acids (Supplemental Table 2) and in relative flexibility ranging from 15 to 65% (Supplemental Table 3). The insensitivity of the RCI method to protein molecular weight makes it a particularly useful tool to study the flexibility of large proteins. This is because larger proteins often present a challenge to conventional NMR relaxation measurements (which are required for model-free analysis) due to spectral overlap and weak signal intensity. Figure 3f shows an example of the high level of agreement between model-free order parameters predicted from the RCI method and the MD RMSF of a structural model (1L6N) of the 32.2 kDa HIV-1 Gag protein. This example also demonstrates the rather impressive performance of the RCI method in identifying mobile regions in a protein that consists of several domains connected by a flexible linker.
Multi-domain proteins often prove to be quite problematic for order parameter calculations. In particular, if the time-scales of individual domain motions and the overall tumbling are close, one may experience difficulties in characterizing the frequency and anisotropy of the overall rotation, which are needed for the model-free analysis. This can lead to difficulties in the accurate calculation of order parameters (Korzhnev et al. 1997). For large-amplitude domain motions, the model-free approach may often have to be replaced with a complex model-dependent analysis such as a Triple-Exponential Wobble-In-A-Cone approximation (Chang and Tjandra 2001). In contrast, the RCI approach requires no model fitting and provides an excellent alternative to explore and quantify intra-domain dynamics of flexible multi-domain proteins. As evident from the aforementioned example, RCI values appear to be insensitive to domain reorientation in the absence of frequent domain collisions. This is likely due to the dominant role of very local de-/shielding effects on chemical shifts.
The RCI time scale
A key question that has not yet been addressed in our previous work is: What is the time-scale of motions captured by the RCI method? The answer to this intriguing question is tightly bound to our understanding the effects of protein dynamics on chemical shifts in proteins. The conformational averaging that affects the positions of individual peaks in an NMR spectrum and, therefore, the shifts used to calculate the RCI occur on the time-scale of fast conformational exchange. Changes in intermediate and slow exchange processes affect peak intensities while having no influence on peak positions (chemical shifts). In a simple case of two-site conformational exchange, the upper time limit of fast exchange depends on the coalescence conditions (the frequency of exchange, at which two slow-exchange peaks merge into a single peak) for a particular chemical shift difference between the two sites (Levitt 2001). It is reasonable to assume that the upper limit of the RCI time scale should also depend on the coalescence conditions of multi-site conformational exchange that each nucleus in the RCI formula (Eq. 1) experiences. In practical terms, protein NMR resonances are generally too weak to be detected at this theoretical coalescence point. Therefore the actual upper limit of the RCI time-scale will correspond to the frequency of fast-intermediate exchange—a point at which peaks become observable in an NMR spectrum. In principle, the exchange rate (or frequency) corresponding to this transition point can be identified by calculating the amplitude of NMR resonances using McConnell’s extension of the Bloch equations (McConnell 1958) and compared with the expected noise level. However, the NMR signal in real experiments also depends on numerous factors, such sample concentration, the efficiency of magnetization transfer in a given NMR experiment, the magnetic field of NMR spectrometer, and the number of scans. The necessity to make assumptions about these factors makes the value of purely theoretical calculations rather limited. Moreover, one should realize that the application of the RCI method and, thus, the RCI time-scale will depend not only on the presence of visible signals in the spectra, but also on the probability of these signals to be assigned to particular nuclei in the protein.
The upper limit of the RCI time-scale determined from the life-time ranges of exchanging states
Types of nuclei
Life-time of exchanging states (s)
10 Hz frequency difference
100 Hz frequency difference
Cα, Cβ, CO
In addition to its utility in assessing the lower limit of the RCI time-scale, it is also possible to use MD methods to identify some of the slower motions affecting RCI values and compare their frequency with the results of aforementioned theoretical calculations of the upper limit of the RCI time-scale. This was done by monitoring the chemical shift averaging process for different residues during MD simulations. Figure 9 (Panels c and d) show the dependence of the averaged Cα and N chemical shifts on the length of averaging period for two helical and two coil residues. The averaging process normally starts from an initial (<100 ps) period of large amplitude chemical shift changes, followed by high-frequency chemical shift oscillations that trend towards the averaged chemical shifts after about 1 ns of temporal evolution. When the differences between the experimental and the predicted average chemical shifts from different residues are combined, it is clear that the averaging process makes the predicted chemical shifts somewhat more accurate (Fig. 9e and f). In helices, the amplitude is much smaller (Fig. 9e and f) and the average chemical shift often reaches its plateau value within 300–500 ps (Fig. 9c and d, Tyr34 and Asn56). In contrast, the average chemical shifts in coil regions often continue to experience significant changes (Fig. 9c and d, Asn72 and Thr76) throughout the course of their MD simulations (2.5 ns). This result suggests that the RCI can manifests motions on the time scale of nanoseconds and above, and is consistent with the sensitivity of the RCI methods to slower motions (with sub-state life-times on μs-ms and shorter time-scales) as suggested by earlier theoretical calculations in this paper.
Simulation of chemical shift averaging in PyJ using MD and ShiftX revealed that the motions in loop regions can result in significant fluctuations of chemical shifts (with one standard deviation up to 5 ppm for 13C and 15N shifts and up to 10 ppm for 1H; Fig. 9g and h). These fluctuations are comparable with the effects of torsion angle variations on chemical shifts observed in quantum mechanical calculations for Hα (Osapay and Case 1994), Cα (Oldfield 1995; Sun et al. 2002; Xu and Case 2002), Cβ (Oldfield 1995; Sun, Sanders et al. 2002; Xu and Case 2002), CO (Xu and Case 2002) and N (Le and Oldfield 1996; Xu and Case 2002). Large chemical shift deviations co-localize with random coil chemical shifts in the PyJ structure (Fig. 9g and h, green line) and are comparable with 3D specific contributions, such as effect of hydrogen bonding (Dedios and Oldfield 1994) that are rare in loops. It is reasonable to conclude that random coil-like chemical shifts in mobile regions originate from chemical shift averaging and are not just a mere result of the absence of 3D specific de-/shielding. Hence, the upper and lower limits of the RCI time scale are primarily associated with the time scale of fast conformational exchange as discussed above.
Conclusions and RCI limitations
The RCI method is not without its faults. As discussed below, the RCI method is somewhat limited in (1) the detection of domain movements; (2) the handling of strongly shielded/deshielded residues or nuclei; and (3) the analysis of magnetically aligned proteins. In general, a dynamic process will change an RCI value only if it changes the averaged chemical shift of a particular nucleus. It is certainly possible that certain motions in a protein may increase the range of sampled chemical shifts without changing the averaged shift. In these cases, the RCI will not be sensitive to such motions.
Obviously, the local environment of a given residue or nucleus plays a critical role in the success and sensitivity of the RCI method. Fast conformational exchange will only alter the chemical shift distribution if it significantly changes nuclear shielding. A complex interplay between opposing shielding/de-shielding effects may result in rigid protein regions (α-helices or β-sheets) displaying small (loop-like) secondary chemical shifts. A recent review of residue-specific secondary chemical shifts revealed that a significant number of residues have relatively small characteristic secondary shifts for N, NH, and Cβ nuclei in α-helices or β-sheets (Wang and Jardetzky 2002a, b). For these residues, the RCI method may be able to predict flexibility only if the chemical shifts are affected by additional long-range shielding. To counter these effects, we recommend the inclusion of Cα, CO, and Hα shifts in RCI calculations to obtain more reliable results.
Certain types of correlated dynamics, such as domain movements or concerted motions of large protein segments, may not result in significant changes in the local structure and environment of moving residues (e.g., residues in the hydrophobic core of a domain). Such motions will not be detected by RCI. However, this can also be an advantage, if one is interested to separate correlated motions and local dynamics (see an example of HIV-1 Gag analysis above).
If both a nucleus and an environmental element, which has a dominant contribution into nucleus chemical shift (e.g., an aromatic ring, a side-chain charge, and certain types of spin-labels), experience the same motional event, this dynamic process will likely fail to change the chemical shift (and hence the RCI) despite fluctuations of local structure. On the other hand, if a residue in a rigid area is in close proximity to a flexible protein segment with high shielding/de-shielding capabilities (e.g., N- and C-termini, aromatic and charged side-chains), motions of that segment may result in “coil-like” large amplitudes of fast conformational exchange of the rigid residue and elevated RCI values. In such a case, the RCI may incorrectly identify the rigid residue as a mobile one. The use of multiple chemical shifts (which are sensitive to different environmental factors) in combination with data smoothing are employed in the RCI protocol to decrease the likelihood of such occurrences.
Some caution should be exercised when interpreting the RCI values of magnetically aligned proteins. Incomplete averaging of the chemical shift anisotropy component of the Hamiltonian may result in significant chemical shift offsets. For example, it was shown that CO chemical shifts of leucine enkephalin may change by more than 3.0 ppm as the degree of alignment increases (Sanders and Landis 1994). Nitrogen shielding is expected to vary due to changes of peptide plane orientation with respect to the external magnetic field as much as 200 ppm (Case 1998). In such cases, efforts should be made to assess the effect of magnetic alignment on chemical shifts and, if necessary, to predict the values of corresponding isotropic chemical shifts prior to any RCI calculations.
Despite these caveats, we believe that the RCI method represents a very simple and robust addition to traditional methods of studying protein flexibility. While it cannot substitute for the actual measurement of relaxation parameters, it is particularly useful for gaining insights into protein dynamics in the absence of these data and for comparison with other kinds of experimentally or computationally acquired dynamics data.
This work was supported by the Natural Sciences and Engineering Research Council (NSERC), the National Research Council’s National Institute for Nanotechnology (NINT), the Protein Engineering Network of Centres of Excellence (PENCE), Alberta Prion Research Institute, and PrioNet Canada.