ORIUM: Optimized RDC-based Iterative and Unified Model-free analysis

Residual dipolar couplings (RDCs) are NMR parameters that provide both structural and dynamic information concerning inter-nuclear vectors, such as N–HN and Cα–Hα bonds within the protein backbone. Two approaches for extracting this information from RDCs are the model free analysis (MFA) (Meiler et al. in J Am Chem Soc 123:6098–6107, 2001; Peti et al. in J Am Chem Soc 124:5822–5833, 2002) and the direct interpretation of dipolar couplings (DIDCs) (Tolman in J Am Chem Soc 124:12020–12030, 2002). Both methods have been incorporated into iterative schemes, namely the self-consistent RDC based MFA (SCRM) (Lakomek et al. in J Biomol NMR 41:139–155, 2008) and iterative DIDC (Yao et al. in J Phys Chem B 112:6045–6056, 2008), with the goal of removing the influence of structural noise in the MFA and DIDC formulations. Here, we report a new iterative procedure entitled Optimized RDC-based Iterative and Unified Model-free analysis (ORIUM). ORIUM unifies theoretical concepts developed in the MFA, SCRM, and DIDC methods to construct a computationally less demanding approach to determine these structural and dynamic parameters. In all schemes, dynamic averaging reduces the actual magnitude of the alignment tensors complicating the determination of the absolute values for the generalized order parameters. To readdress this scaling issue that has been previously investigated (Lakomek et al. in J Biomol NMR 41:139–155, 2008; Salmon et al. in Angew Chem Int Edit 48:4154–4157, 2009), a new method is presented using only RDC data to establish a lower bound on protein motion, bypassing the requirement of Lipari–Szabo order parameters. ORIUM and the new scaling procedure are applied to the proteins ubiquitin and the third immunoglobulin domain of protein G (GB3). Our results indicate good agreement with the SCRM and iterative DIDC approaches and signify the general applicability of ORIUM and the proposed scaling for the extraction of inter-nuclear vector structural and dynamic content. Electronic supplementary material The online version of this article (doi:10.1007/s10858-013-9775-1) contains supplementary material, which is available to authorized users.


Introduction
Protein structure and dynamics are routinely investigated with NMR spectroscopy at atomic resolution. An essential NMR parameter that provides both structural and dynamic information is the residual dipolar coupling (RDC) between two nuclear magnetic moments, for example a N-H N or Ca-Ha vector within the polypeptide backbone of a protein (Tjandra and Bax 1997;Tolman et al. 1997). An important application of RDCs as a probe for protein dynamics has been shown recently where RDCs measured in 36 different alignment media demonstrated that the ground state conformational ensemble of ubiquitin covers the conformational space captured in crystal structures of ubiquitin complexes Lakomek et al. 2008). These findings provide strong support for conformational selection as a means for molecular recognition. Therefore, extracting the dynamical content from RDCs has implications for understanding the mechanisms of molecular recognition and protein function.
The significance of the RDC's dynamical content is highlighted when considering other approaches for studying protein dynamics. Measurements of NMR spin-relaxation provide information concerning amplitudes of inter-nuclear vector motions occurring on time-scales faster than the rotational correlation time (s c ) of the protein (picosecond to nanosecond) (Kay et al. 1989), which are parameterized by the Lipari-Szabo order parameter (S LS 2 ) (Lipari and Szabo 1982a, b). Relaxation dispersion methods probe the kinetics of conformational exchange that modulates the isotropic chemical shift and contributes to the effective transverse relaxation rate (Palmer 2004). To date, relaxation dispersion techniques have been limited to time-scales slower than approximately 25 ls (Ban et al. 2011(Ban et al. , 2012. By contrast, RDCs provide vital insight into the amplitude and direction of internal vector motions on time-scales covering the previously inaccessible time window spanning s c to *25 ls (referred to as the supra-s c window).
Residual dipolar couplings (RDCs) arise from placing a protein in an anisotropic medium, such as filamentous phages or lipid bilayers, or paramagnetic tagging, leading to partial alignment of the protein with respect to the external magnetic field. In the anisotropic media, all possible orientations for an inter-nuclear vector are populated with unequal probability, resulting in the dipolar couplings no longer averaging to zero. The magnitude of the measured RDC is given by the time-averaged angle between the inter-nuclear vector and the magnetic field (Tolman et al. 1997).
Since the potential to extract dynamics from RDCs was first recognized, two schemes for extracting the dynamical content from these NMR parameters in the form of a generalized order parameter ðS 2 RDC Þ have been proposed. In the model free analysis (MFA), five independent alignment media are necessary to calculate the five independent elements of the inter-nuclear vector tensor (Meiler et al. 2001;Peti et al. 2002). Figure 1 illustrates the three frames of reference used in the analysis of RDCs, the molecular frame (MF), the alignment frame (AF), and the vector frame (VF). Knowledge of the protein structure is necessary to determine the alignment tensors. With the alignment tensor information, the averages over the second rank spherical harmonics describing the mean orientations of the vectors, contained within the inter-nuclear vector tensor Y 2;m ðh; /Þ À Á , see also Fig. 2), provide the desired structural and dynamic content. An alternative approach, the direct interpretation of dipolar couplings (DIDCs), was developed to bypass the need for structural input in the calculation of the inter-nuclear vector's structural and dynamic content (Tolman 2002). Five independent alignment media are also necessary for the DIDC method. A single matrix equation is employed to represent the RDC data obtained in multiple alignment media. The internuclear vector tensors are optimized simultaneously and variation in S 2 RDC is minimized. Both the MFA and DIDC methods have been incorporated into iterative schemes with the goal of improving the accuracy of the alignment tensor calculation by reducing the effects of the structural noise, termed the self-consistent RDC based MFA (SCRM) and iterative DIDC Yao et al. 2008). The iterative schemes achieve this by using the refined dynamically averaged coordinates as input for additional runs of either MFA or DIDC, however each approach, as implemented, relies on computationally expensive procedures. In the iterative DIDC method, a grid search is performed which minimizes the difference between the vector's coordinates and a pool of possible solutions built from an exhaustive list of (h, /) combinations to find dynamically averaged coordinates (Yao et al. 2008). As for SCRM, the dynamic average orientation of each vector is calculated by performing a coordinate transformation with maximization of Y 2;0 ðh; /Þ ). Recently, it has been shown that this transformation can be replaced with the diagonalization of the local ordering Saupe matrix (Meirovitch et al. 2012), which is computationally less demanding.
Here, we describe a new iterative procedure for extracting structural and dynamic information from RDCs entitled Optimized RDC-based Iterative and Unified Model-free analysis (ORIUM). ORIUM unifies the theoretical concepts developed in the MFA, SCRM, and DIDC methods. In addition, a new method is presented to establish a lower bound on protein motion using RDC data alone without requiring a separate determination of S 2 LS . While this has been achieved previously (Salmon et al. 2009) based on several sets of RDCs assuming Gaussian fluctuations, the method introduced here works on a single set of RDCs and does not require a motional model. The applicability of ORIUM and the new scaling procedure are tested with the model proteins ubiquitin and the third immunoglobulin domain of protein G (GB3).

Theory
Optimized RDC-based Iterative and Unified Model-free analysis (ORIUM) consists of three principal stages for the extraction of RDC order parameters S 2 RDC À Á from data measured in multiple alignment media (see Fig. 2 for schematic diagram). First, the matrix formalism introduced by Tolman in the DIDC approach is utilized to calculate refined structural coordinates from the alignment tensors (Tolman 2002). From here, each refined vector is put into a local axis system in order to determine the vector specific structural and dynamic information (Meirovitch et al. 2012;Meiler et al. 2001;Peti et al. 2002). Finally, the resulting Euler angles are used as structural input to restart the calculation in an iterative fashion, similar to SCRM . ORIUM continues until the variation in S 2 RDC À Á for the entire dataset falls below a certain threshold.

Alignment tensor calculation
For two nuclear spins, the observed resonance splitting (Hz) resulting from the partial alignment of a protein emanate from the secular part of the magnetic dipole interaction  Fig. 2 Schematic diagram of both ORIUM and SCRM protocols . Both procedures begin with a starting structure as input and use experimental RDCs to calculate the alignment tensors in the molecular frame (MF). In SCRM, the principal axis system of each alignment tensor is determined in order to extract A PAS zz ; R; a; b; c where l 0 is the permeability of vacuum, c X is the gyromagnetic ratio of spin X, " h is Planck's constant, r ij is the distance between nuclei i and j (assumed to be fixed at 1.02 Å for the N-H N and 1.095 Å for the Ca-Ha vectors), and h k is the angle between the inter-nuclear vector formed by nuclear spin pair k and the magnetic field (B 0 ). The angular brackets denote ensemble averaging. As Eq. (1) explicitly illustrates, the magnitude of D exp k depends on ð3 cos 2 h k À 1Þ=2 . By definition, the term cos h k is the scalar product between an inter-nuclear vector and the vector parallel to B 0 .
When considering a rigid molecule, the coordinates of an inter-nuclear vector can be described within an arbitrary reference frame, termed the molecular frame (MF), and defined by three angles, b x , b y , and b z , between the vector and the respective MF axes. In a similar fashion, the vector parallel to B 0 can be expressed by three angles representing the instantaneous orientation of B 0 relative to the MF axes, a x , a y , and a z . Within the MF, D exp k can be recast as where B Á A h iis the scalar product of two vectors representing the inter-nuclear orientations (B) and the B 0 orientations (A). Here, A is the alignment tensor and B is the inter-nuclear vector tensor. Both A and B contain five independent terms and are related to a 3 9 3 second rank Cartesian order tensor as follows (Saupe 1964(Saupe , 1968Snyder 1965) where the orientation of B 0 in the MF is given by and where the orientation of the inter-nuclear vector in the MF is described by The term d mn represents the Kronecker delta function, l is the alignment condition, and m, n = x, y, z.
A matrix formalism is introduced to render analysis of the RDC data in a more intuitive manner (Tolman 2002). When K RDCs are measured under L alignments, then Eq. (2) becomes where D is a K 9 L matrix, B is a K 9 5 matrix, and A is a 5 9 L matrix. In Eq. (5), the term D max ij is included in A h i. The rows of B are defined by Eq. (4) and the columns of A are given by Eq. (3). An inherent assumption in the present analysis is that inter-nuclear dynamics are uncorrelated with the alignment process; hence the averages of B h i and A h i are independent of each other. This assumption can be tested with the SECONDA analysis (Hus and Brüschweiler 2002;Hus et al. 2003). When the structure of the molecule is known and RDCs for at least five linearly independent inter-nuclear vectors are measured, the matrix B (input from the rigid structure or random structural coordinates) and the measured RDCs are used to calculate A h i where B ? is the pseudo-inverse of B. It should be noted that a single alignment tensor per alignment medium is necessary for the successful application of the following protocols. Intrinsically disordered proteins (see Bertoncini et al. 2005;Bernadó et al. 2005) and multiple domain proteins (see Bertini et al. 2004;Rodriguez-Castañeda et al. 2006) will have to be described by several alignment tensors per alignment medium and will not be amenable to the present analysis. Each column of A h i, given by Eq. (3), can be recast into L symmetric 3 9 3 second rank Cartesian order tensors, ðA ð2Þ l Þ. These order matrices are then redefined in a principal axis system (PAS), termed the alignment frame (AF), where Eq. (1) becomes (Bax et al. 2001) In Eq. (7), the magnitude of the alignment tensor is ðh AF k;l ; / AF k;l Þ are the polar angles defining the inter-nuclear vector in the AF, and A PAS ðmm;lÞ are the eigenvalues resulting from the diagonalization of A ð2Þ l . From the eigenvectors A EV mn;l , the Euler angles describing the rotation of A ð2Þ l into the PAS are defined Model free analysis With the MFA (Meiler et al. 2001;Peti et al. 2002), the five parameters describing each alignment tensor in the PAS, A PAS zz ; R; a; b; c È É l , are used to construct the F matrix which is needed to derive the five dynamically averaged second order spherical harmonics Equation (7) can be recast in terms of dynamically averaged second order spherical harmonics where The F matrix relates the measured RDCs to the spherical harmonics defined in the MF by a Wigner rotation from the MF to the AF In analogy to the component definition from Eq. (5), Y is a K 9 5 matrix containing the dynamically averaged spherical harmonics in the MF and F is a 5 9 L matrix containing the alignment tensor information. The Y h i refined matrix is determined in direct correspondence to Eq. (6) Here, D normalized represents D exp k;l A PAS zz;l in order to normalize the contributions of each alignment condition to the calculation of refined structural coordinates. Each row of Y h i refined is used to determine S 2 From the dynamically averaged spherical harmonics, the dynamically averaged orientations for each inter-nuclear vector, h MF avg;k ; / MF avg;k , can be obtained. Maximizing places the z axis of the vector's axis system, termed the vector frame (VF), in the center of the inter-nuclear vector's orientational distribution, The terms Y 2;AE1 h MF k ; / MF k À Á vanish in the VF and Y 2;AE2 ðh MF k ; / MF k Þ possesses information on the amplitude of anisotropy, g k , and the orientation of anisotropic motions, / 0 It should be noted that S 2 RDC;k is the same in any frame, thus which is equivalent to Eq. (14).
For each inter-nuclear vector, h MF k ; / MF k À Á and / 0 k are extracted from the transpose of the resulting B EV mn;k matrix Direct interpretation of dipolar couplings With DIDC, once A h i is determined from Eq. (6), A h i is used to directly calculate a new set of dynamically averaged coordinates, B refined , without extracting each set of This formula leaves the information for A h i in the MF. It should be noted that the previous implementations of DIDC did not scale the RDCs by A PAS zz;l as in the MFA (see Eq. 13), which is necessary to normalize the contributions of each alignment condition for the calculation of refined structural coordinates. Therefore, we have modified Eq. (24) as follows where D normalized and A h i normalized represent the RDCs and alignment tensors divided by A PAS zz;l . As described by Tolman, the first term in Eqs. (24) and (25) encompasses the contribution of the measured RDCs to determining B h i refined (Tolman 2002). When the rank of A h iis smaller than 5, then the second term accounts for the degeneracy in the possible solutions that results from B. Otherwise, this term will equal zero for data representing more than five alignment media. With B h i refined , the 3 9 3 second rank Cartesian tensor, B k , for each inter-nuclear vector is constructed, diagonalized into the VF, and Eqs. (21) h i refined , the 3 9 3 symmetric Saupe matrix is constructed for the inter-nuclear vectors using expressions  (23), each set of fS 2 RDC ; g; h MF ; / MF ; / 0 g k is extracted. These refined angles (h k MF , / k MF ) are used as input for the next cycle of ORIUM. The cycle is finished when the convergence of order parameter is achieved using the relationship where r is a cycle of iteration. The ORIUM approach differs from the SCRM method as follows. There is a minor difference: with SCRM, the internuclear vector coordinates are defined in terms of spherical harmonics, while ORIUM utilizes Cartesian coordinates. The relationship between the spherical harmonics and the Cartesian coordinates are give by Eqs. (19a) k into a local axis system by diagonalization of the symmetric 3 9 3 second rank Cartesian tensor.
There are three key differences between ORIUM and the iterative DIDC approach. First, a grid search is implemented with the iterative DIDC scheme which minimizes the difference between the vector's coordinates obtained from B h i refined and an exhaustive list of ðh; /Þ combinations to find dynamically averaged coordinates. As stated above, ORIUM diagonalizes B ð2Þ k into a local axis system in order to extract this information. The second key difference is that with the iterative DIDC scheme each inter-nuclear vector is constrained to be rigid S 2 RDC ¼ 1 À Á . Only during the final iterative run is the S 2 RDC ¼ 1 constraint removed. ORIUM never constrains the dynamics of the inter-nuclear vectors during the iterative procedure. A final divergence between the two procedures is how flexible inter-nuclear vectors are removed from the calculation of the alignment tensors. In the iterative DIDC procedure, RDC data for an individual inter-nuclear vector is removed from the calculation of the alignment tensors if the error in the experimental and back-calculated RDCs is greater than a factor of 2. The calculation is restarted and RDC data for the next inter-nuclear vector is once again removed from the calculation if the deviation is greater than a factor of 2. This procedure is repeated until all inter-nuclear vectors fulfill the threshold for the error in experimental and back-calculated RDCs. At this point, the S 2 RDC ¼ 1 constraint is removed and a final iteration is performed. ORIUM removes the most flexible residues S 2 RDC 0:95, after Eq. (26) has been satisfied (see below) and then ORIUM is restarted until Eq. (26) is once again fulfilled.
As with the RDC-based model free analysis, the fundamental assumption is that the internal protein dynamics for each inter-nuclear vector is uncorrelated with fluctuations with the alignment tensor. Thus, a single average alignment tensor can be utilized for each medium. Molecular dynamics simulations indicate that this assumption is true for secondary structural elements, however B h i and A h i dynamics may be correlated for the most mobile regions of a protein (Louhivuori et al. 2006;Salvatella et al. 2008). To circumvent this potential inseparability of mobile inter-nuclear vectors and the alignment tensor fluctuations, the approach outlined in the SCRM procedure is followed ). After convergence is achieved with Eq. (26), the residues that are the most mobile, as determined by fulfilling the relationship S 2 RDC 0:95, are removed from the A h i calculation and ORIUM is restarted with B h i refined from the previous iteration until Eq. (26) is once again satisfied.
The validity of ORIUM was accessed with synthetic RDC data containing a measurement error (0.3 Hz) for the 36 alignment media, which was generated using the RDC refined ubiquitin ensemble ERNST (PDB:2KOX) . The corresponding dynamic parameters (S 2 RDC and g) were also calculated using the same ensemble. Using these synthetic RDC data, ORIUM was conducted and the resulting dynamic parameters have been compared with those calculated from the ensemble. The Pearson correlations of S 2 RDC and g are 0.97 and 0.93, respectively. It should be noted that the local PAS differs from the VF when B zz;k is a negative value, although the local PAS is usually the VF. In this case, the averaged vector orientation is actually orthogonal to the z axis of PAS. This issue can be alleviated by choosing a new axis system referred to the vector frame system (VFS), with eigenvalues ordered B zz;k ! B xx;k ! B yy;k instead of jB zz;k j ! jB xx;k j ! jB yy;k j. It should also be noted that g from the VFS and the PAS can be significantly different in the case that B zz;k has a negative value. ORIUM utilizes the VFS after removal of residues with S 2 RDC 0:95 to obtain dynamically averaged angles of the bond vector distribution.
Determination of S 2 RDC scaling factor: S overall An inherent complication when calculating S 2 RDC from experimental RDCs is that dynamic averaging will reduce the actual magnitude of A PAS zz;l or D a;l . This reduced magnitude will result in some S 2 RDC parameters over 1, and thus S 2 RDC can only be considered as relative gauge of the actual amplitudes of motion, defined as S 2 RDC;unscaled (Lakomek et al. 2006;Meiler et al. 2001). It should be noted that the alignment tensor parameters R; a; b; c f g l are unaffected by J Biomol NMR (2014) 58:287-301 293 the reduction in the magnitude of A PAS zz;l or D a;l . Two avenues to circumvent this complication have been developed. Either all the order parameters are scaled relative to the largest S 2 RDC;unscaled leaving one order parameter equal to one (iterative DIDC approach) (Tolman 2002;Yao et al. 2008), or S 2 RDC;unscaled is scaled relative to the Lipari-Szabo order parameters (S 2 LS ) calculated for each residue (MFA/ SCRM approach) (Lakomek et al. 2006. The problem with the first approach is that the resulting S 2 RDC parameters will underestimate the amplitude of motion for each inter-nuclear vector. Overestimation can only occur if the largest S 2 RDC;unscaled parameter has a large experimental error, leading to an artificially greater value for this parameter than in reality. Sub-and supra-s c motion happening for each vector equally will not be picked up by this approach, underestimating the motion except for the mentioned case. As for the second approach, S 2 LS are required which may not be available for the vectors being analyzed. While this approach has been successfully applied, it may also underestimate motion since a general supra-s c motion affecting all the nuclei will not be picked up by this approach. Comparison of the S overall derived in Lakomek et al. 2008 andLange et al. 2008 with the average order parameter from solid state data (Schanda et al. 2010) shows that the solid state NMR derived average order parameter is smaller than the one derived by this second approach suggesting that supra-s c motion affecting all nuclei is seen by solid state NMR but not the S 2 LS versus S 2 RDC approach.
Here, we present a new method for determining S overall without the requirement of additional information, such as S 2 LS , which may not be available for the inter-nuclear vectors under investigation. The scaling procedure separates an inter-nuclear vector's motion into its principal axes in Cartesian space and leads to parameters that have a more straightforward physical interpretation. The inter-nuclear vector's motional variance is directly related to the resulting eigenvalues calculated from diagonalization of B ð2Þ k into a local axis system. The methodology outlined below exploits the fact that variance cannot be negative by definition. Therefore, a uniform scaling parameter, S overall , is necessary to insure that the variance for each internuclear vector about each of the three principal axes is positive. In the following, we present a brief outline for the derivation of bond vector motional variance for the determination of S overall .
For each vector, the following relationships between the dynamically averaged Eigenvalues and the unit vector coordinates (x, y, z) within the VF, as shown in Eq. (4a), are as follows The normalization condition sets x 2 þ y 2 þ z 2 ¼ 1, which also implies x 2 þ y 2 þ z 2 ¼ 1. Therefore, B zz can be recast as Utilizing the definitions of S 2 RDC and g [Eqs.
(21), and (22)], we can now reformulate S 2 RDC and g in terms of the Cartesian coordinates defined within the VF The definition of variance is r 2 k = x, y. Therefore, r 2 k can be substituted for k 2 . Now, S 2 RDC and g are defined in terms of variance Solving the system of equations gives the inverse relationships A graphical depiction of the mapping between these parameters is shown in Figure S1. Using the relationship (S 2 RDC ¼ S 2 overall S 2 RDC;unscaled ), these equations can be written as Since the variance must always be positive, the axis with the least variance (ðr 2 y Þ should also be positive. Thus, the following inequalities are derived relating S 2 RDC and g to r 2 y r 2 y ¼ Using Eq. (38), residue-specific S max overall can be obtained using S 2 RDC and g, or from the lowest eigenvalue. The eigenvalue definition of S max overall follows directly from Eq. (27).
Since the reduction of magnitude in the alignment due to dynamic averaging is a global effect throughout all residues, the least residue-specific S max overall may be utilized as the scaling factor, if there is no experimental error. The previous method in which all order parameters are scaled relative to the largest S 2 RDC;unscaled , leaving one order parameter equal to one (iterative DIDC approach) (Tolman 2002;Yao et al. 2008) is related to this new approach. If bond vector anisotropy is assumed to be axially symmetric (g = 0), S max overall turns into 1=S RDC;unscaled . This is identical to scaling all inter-nuclear vectors such that the largest is 1 ( Figure S1).
This scaling approach using the lowest residue-specific S max overall may introduce a systematic bias due to the fact that experimental data contain errors. In order to alleviate the systematic bias, we used a statistical procedure accounting for the effect of experimental noise on S overall without any knowledge of S 2 LS unlike the SCRM approach. First, scaling factors were calculated from the original data as well as datasets with noise added equivalent to the experimental error. These scaling factors were used to determine a value (which we term S 95 % overall ), below which there was a 95 % chance that the true S overall would fall. Given the maximum scaling factor from the original data that fulfills the constraint equation for all inter-nuclear vectors, S max overall , and corresponding set of values from noise added data (NAD), S max overall;NAD , the S 95 % overall value can be calculated as follows: where the quantile function returns the given quantile of the set. The quantile prefactor compensates for systematic shifts resulting from the addition of experimental error. With the previous study , the determination of S overall was conservative in order to circumvent the chance for over-estimating the supra-s c motion, reflected in the reported S 2 RDC . Here, the criterion for scaling is that r 2 y should be positive, which possesses no time-scale bias. Yet, it should be noted that this overall order parameter is an upper limit for S overall since it could underestimate motion if there is a uniform sub-or supra-s c motion affecting all vectors. This is summarized in Table 1.

Applications
Ubiquitin: comparison of ORIUM with SCRM In order to compare ORIUM with the SCRM method, N-H N RDCs were used from measurements performed in 36 different alignment media (D36M) for the 76-residue protein ubiquitin (see Lakomek et al. 2008 for RDCs and references therein). The X-ray structure 1UBI of ubiquitin was used as the input structure for the first cycle of ORIUM (Ramage et al. 1994). For the error estimation of each extracted set of S 2 RDC ; g È É k , 1,000 Monte Carlo simulations were performed by adding uncertainty to the RDCs drawn from a Gaussian distribution with a standard deviation given by the error in the RDC set (0.3 Hz). On a single core of an Intel Core i7-2635QM CPU, the 1,000 Monte Carlo simulations required 18 min for ORIUM versus 83 min for SCRM, where the convergence criterion was set as in the SCRM implementation ). When we used a 100-fold stricter convergence criterion than SCRM (see Eq. 26), the calculation was still faster on the same CPU (74 min). Thus, ORIUM is an optimized approach for the better convergence of the dynamic parameters.
A comparison of the D36M RDC data set analyzed with ORIUM and the SCRM method (re-implemented in this study) is presented in Fig. 3, which shows S 2 RDC and g determined for each residue (see Table S1 for the actual values calculated from the ORIUM analysis of the D36M data set). The correlation coefficients for the S 2 RDC and g parameters are both 0.99, which shows that in principle both the iterative DIDC and SCRM should yield identical results. However, in the previous implementations of DIDC (Tolman 2002;Yao et al. 2008), the effect of the alignment magnitude on the angle calculation, as shown in Eqs. (24) and (25), was not recognized, leading to variations in the S 2 RDC and g values ( Figure S2). To determine whether normalization by alignment strength produces more accurate results, we compared Q factors for each alignment condition ( Figure S3). For both the standard fitting procedure and a cross validation procedure, normalization produced significantly better Q factors (with respective p values of 0.022 and 0.00033). In the unnormalized case without cross validation, stronger alignment conditions showed lower Q factors than weak conditions, indicating that they were contributing disproportionately to the fit. This lack of proportionality is also evident in an examination of the alignment tensors themselves. To estimate the degree to which the five dimensional alignment tensor space is uniformly sampled, condition numbers can be used, where a lower value indicates more uniform sampling (Peti et al. 2002). We checked the condition numbers of A h i and A h i normalized for ORIUM implemented with Eqs. (24) and (25), respectively. Figure S4 plots the condition number versus ORIUM iteration until Eq. (26) is satisfied. For the unnormalized alignment tensors ( A h i using Eq. 24), the condition number finishes at 8.36, while for the normalized alignment tensors ( A h i normalized using Eq. 25) the condition number is significantly lower finishing at 6.19. This shows that normalization of the RDCs based on A PAS zz;l is important to adjust the contributions of strong versus weak alignments in the calculation of inter-nuclear vector orientations and dynamics. It should also be mentioned that the different values of the condition numbers do not directly indicate the reliability of the dynamics but rather the degree to which alignment space is uniformly sampled.
The slight deviation in S 2 RDC between ORIUM and SCRM (Fig. 3b) originates from the S overall , which is 0.87 from the ORIUM approach compared with the reported value of 0.89 from the SCRM report using S 2 LS to calculate S overall . Remarkably, the present approach for S overall determination yields scaled S 2 RDC parameters that are below or within error of the S 2 LS values without utilizing S 2 LS in the calculation of S overall . Due to the slightly lower S overall with ORIUM, the average S 2 RDC for ORIUM and SCRM is 0.69 and 0.72, respectively.
In addition to starting with the 1UBI structure, the ORIUM calculation was also tested with random coil input structural coordinates and results are identical to those starting with 1UBI. This worked for ORIUM and not other procedures like SCRM because at the beginning of iteration, a sizable fraction of residues used for alignment tensor calculation had their largest eigenvalue with a negative sign. For these residues, the angle used for tensor calculation was then orthogonal to the mean angle. By the time the iteration was completed, no residues had negative maximum eigenvalues. This result shows the potential utility of ORIUM in the calculation of structural parameters from random structural input and perhaps used in the refinement of conformational ensembles. Both ideas are currently under investigation.
It is also interesting to compare the results from ORI-UM, specifically in regard to the calculated S overall of 0.87, to a recent study examining the dynamics of ubiquitin in the microcrystalline state (Schanda et al. 2010). Here, the scaling of the solid-state order parameters (S 2 SS ) is unnecessary since the protein is not tumbling and therefore the calculated order parameters should reflect the absolute magnitude of the amplitudes of motion for each internuclear vector. The time-scale of motion embodied in S 2

SS
spans up to about one digit microsecond (Chevelkov et al. 2010), whereas S 2 RDC encompasses motion up to about one millisecond. In principle, S 2 SS is expected to be higher than S 2 RDC due to the time-scale of motion embodied by the two order parameters, assuming that both the conditions for ubiquitin in microcrystalline and in solution are identical. The previously reported S 2 RDC values using S overall of 0.89 ) are on average higher than the order parameters reported for the microcrystalline state (Schanda et al. 2010). The average order parameter values of S 2 RDC and S 2 SS for residues 2-70 are 0.75 and 0.72, respectively. This may be due to the fact that the determination of S overall was done to circumvent the chance for over-estimating the supra-s c motion and thus the S overall was too conservative as described earlier. Figure S5 shows that with the approach presented here for the calculation of S overall without any  Salmon et al. (2009) time-scale bias most of the residues possess S 2 RDC that are comparable or within the error of S 2 SS .

GB3: comparison of ORIUM with iterative DIDC
We compared ORIUM with the iterative DIDC method. N-H N and Ca-Ha RDCs were used from measurements performed on 6 mutants of the third immunoglobulin domain of protein G (GB3) aligned with Pf1 phages (Yao et al. 2008). The reported error in the N-H N and Ca-Ha RDCs were 0.2 and 0.4 Hz, respectively. The NMR structure 2OED of GB3 was used as the input structure for the first cycle of ORIUM (Ulmer et al. 2003). From the SECONDA analysis of the RDC data as described in the iterative DIDC publication (see Fig. 4 in Yao et al. 2008), the following RDCs were removed from the entire analysis due to inconsistencies in the structure and dynamics for these inter-nuclear vectors over the 6 different alignment media: residues 19 and 41 for the N-H N RDCs and residues 11, 25, 30, and 40 for the Ca-Ha RDCs. Error estimation followed the same protocol used for ubiquitin. The ORIUM calculation was performed with only the N-H N RDCs or the Ca-Ha RDCs using the actual errors in the RDC data of 0.2 and 0.4 Hz for the N-H N RDCs and the Ca-Ha RDCs, respectively. The results are illustrated in Figs. 4 and 5 and compiled in Tables S2 and S3. The correlation for the N-H N RDCs is 0.91 for S 2 RDC and is 0.92 for g. As for the Ca-Ha RDCs, the correlation coefficients for S 2 RDC and g are 0.85 and 0.77, respectively. Here, the S overall calculated from the ORIUM approach is 0.83 for both RDC types. This S overall scaling leads to an average N-H N S 2 RDC of 0.65 and Ca-Ha S 2 RDC of 0.66. Thus, the S 2 RDC values are on average 22 % lower than in the iterative DIDC publication. As shown in Figs. 4e and 5e, iterative DIDC has significantly more negative variances than ORIUM. This is primarily because the r 2 y ! 0 constraint used for ORIUM is more restrictive than the S 2 RDC 1 constraint used for iterative DIDC ( Figure S1), making the ORIUM S 2 RDC values lower than iterative DIDC. In principle, both ORIUM and the iterative DIDC should give identical results as described earlier. The discrepancy reflected in the correlations may originate from the removal of the bias in the calculated structural and dynamic parameters due to the magnitude of alignment media [see Eqs. (24) and (25) parameters (Chang and Tjandra 2005). b S 2 RDC correlation plot. c g plot by residue. d g correlation plot these calculations, uncertainties of 0.3 and 0.6 Hz were used for the MC analysis as done in iterative DIDC method and the Ca-Ha RDCs were scaled by a factor of 2.08 -1 (Bax et al. 2001) (see Figures S6 and S7). Because of the small number of alignment conditions for GB3, it is not possible determine significant differences in a Q factor analysis with and without normalization. However, we checked the condition numbers of A h i (Peti et al. 2002) for ORIUM implemented with either Eq. (24) or Eq. (25) (see Figure S8). For the unnormalized alignment tensors, the condition number finishes at 7.84, while for the normalized alignment tensors the condition numbers are again lower finishing at an average of 7.15. The correlation for the N-H N RDCs is 0.94 for S 2 RDC and is 0.96 for g. As for the Ca-Ha RDCs, the correlation coefficients for S 2 RDC and g are 0.88 and 0.91, respectively. Although the correlations are improved, the remaining discrepancies may result from the differences in the implementation of ORIUM versus the iterative DIDC. The iterative DIDC method utilizes a grid search to find h MF avg;k ; / MF avg;k for each internuclear vector. During the grid search, each vector is constrained to be rigid, ðS 2 RDC ¼ 1Þ, until the final iterative run when the constraints are lifted and the dynamic parameters calculated. As in case with normalized ORIUM, unnormalized ORIUM shows lower S 2 RDC values than DIDC, which results from the less restrictive DIDC constraint (S 2 RDC 1) allowing more negative variances ( Figures S6  and S7).  Fig. 4 Comparison of ORIUM (blue) and iterative DIDC (green) derived N-H N S 2 RDC and g parameters for GB3. The ORIUM calculation was performed with only the N-H N RDCs. For ORIUM, the estimated error results from 1,000 Monte Carlo simulations that add uncertainty to the RDCs drawn from a Gaussian distribution with a standard deviation given by the error in the experimental RDCs reported in the iterative DIDC publication of 0.2 Hz. From the iterative DIDC analysis, 100 Monte Carlo simulations were performed with an uncertainty of 0.3 Hz and g is determined only for residues that were not fit to an isotropic motional model (Yao et al. 2008). a S 2 RDC plot by residue. The solid line represents the S 2 LS parameters (Hall and Fushman 2003). b S 2 RDC correlation plot. c g plot by residue. d g correlation plot. e r 2 x and r 2 y (lighter colors) by residue. f r 2 x and r 2 y (lighter colors) correlation plot

Conclusions
With ORIUM, we present a computationally efficient method for extracting structural and dynamic information for inter-nuclear vectors from RDCs unifying previously published concepts into one compact protocol. Furthermore, we demonstrate a new scheme for scaling the derived S 2 RDC parameters based on variances of a single type of RDC without needing S 2 LS as a constraint which constitutes an upper limit of S overall . Dynamics occurring on time-scales slower than the rotational correlation time of proteins, encoded in RDCs, have important implications protein functionality, including enzyme catalysis, molecular recognition, and correlated motions. The concepts set forth in this paper will go far in streamlining the procedure for calculating the dynamic average orientation and associated amplitudes of motion for inter-nuclear vectors in an iterative manner as long as RDC data sets are acquired in at least five independent alignment media.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.