# High-resolution protein structure determination starting with a global fold calculated from exact solutions to the RDC equations

## Abstract

We present a novel structure determination approach that exploits the global orientational restraints from RDCs to resolve ambiguous NOE assignments. Unlike traditional approaches that bootstrap the initial fold from ambiguous NOE assignments, we start by using RDCs to compute accurate secondary structure element (SSE) backbones at the beginning of structure calculation. Our structure determination package, called rdc-Panda (*RDC*-based SSE *PA*cking with *N*OEs for Structure *D*etermination and NOE *A*ssignment), consists of three modules: (1) rdc-exact; (2) Packer; and (3) hana (*HA*usdorff-based *N*OE *A*ssignment). rdc-exact computes the global optimal solution of backbone dihedral angles for each secondary structure element by exactly solving a system of quartic RDC equations derived by Wang and Donald (Proceedings of the IEEE computational systems bioinformatics conference (CSB), Stanford, CA, 2004a; J Biomol NMR 29(3):223–242, 2004b), and systematically searching over the roots, each of which is a backbone dihedral ϕ- or ψ-angle consistent with the RDC data. Using a small number of unambiguous inter-SSE NOEs extracted using only chemical shift information, Packer performs a systematic search for the core structure, including all SSE backbone conformations. hana uses a Hausdorff-based scoring function to measure the similarity between the experimental spectra and the back-computed NOE pattern for each side-chain from a statistically-diverse rotamer library, and drives the selection of optimal position-specific rotamers for filtering ambiguous NOE assignments. Finally, a local minimization approach is used to compute the loops and refine side-chain conformations by fixing the core structure as a rigid body while allowing movement of loops and side-chains. rdc-Panda was applied to NMR data for the FF Domain 2 of human transcription elongation factor CA150 (RNA polymerase II C-terminal domain interacting protein), human ubiquitin, the ubiquitin-binding zinc finger domain of the human Y-family DNA polymerase Eta (pol η UBZ), and the human Set2-Rpb1 interacting domain (hSRI). These results demonstrated the efficiency and accuracy of our algorithm, and show that rdc-Panda can be successfully applied for high-resolution protein structure determination using only a limited set of NMR data by first computing RDC-defined backbones.

### Keywords

Nuclear magnetic resonance Nuclear overhauser effect assignment Residual dipolar coupling Structure determination Packing### Abbreviations

- NMR
Nuclear magnetic resonance

- ppm
Parts per million

- RMSD
Root mean square deviation

- HSQC
Heteronuclear single quantum coherence spectroscopy

- NOE
Nuclear Overhauser effect

- NOESY
Nuclear Overhauser and exchange spectroscopy

- RDC
Residual dipolar coupling

- PDB
Protein Data Bank

- pol η UBZ
Ubiquitin-binding zinc finger domain of the human Y-family DNA polymerase Eta

- CH
C

^{α}-H^{α}- hSRI
Human Set2-Rpb1 interacting domain

- FF2
FF Domain 2 of human transcription elongation factor CA150 (RNA polymerase II C-terminal domain interacting protein)

- POF
Principal order frame

- SA
Simulated annealing

- MD
Molecular dynamics

- SSE
Secondary structure element

- C′
Carbonyl carbon

- WPS
Well-packed satisfying

- vdW
van der Waals

- SM
Supplementary Material

## Introduction

One of the main bottlenecks in protein NMR structure determination lies in the acquisition and interpretation of a sufficient number of accurate distance restraints from nuclear Overhauser effect (NOE) data, which is often obstructed by assignment ambiguities due to chemical shift degeneracy. Traditional NMR structure determination approaches (Güntert 2003; Mumenthaler et al. 1997; Gronwald et al. 2002; Huang et al. 2006; Kuszewski et al. 2004) rely on heuristic techniques such as molecular dynamics (MD) and simulated annealing (SA), and may use a variety of data, including chemical shifts, scalar couplings, NOEs and residual dipolar couplings (RDCs). In these approaches, however, RDCs are typically not employed in the annealing protocol until the end of structure calculation (i.e. refinement). Moreover, SA/MD based structure determination algorithms are used in these approaches as a subroutine in an iterative NOE assignment protocol. The SA/MD structure determination subroutine must typically be carefully initialized the first time using only reliable NOE assignments. The computed structures are then used to prune ambiguous NOE assignments. The SA/MD subroutine is used in an iterative fashion with new NOE assignments until convergence is reached.

Since NOEs provide merely *local* distance restraints, the correctness of an NOE assignment usually depends on the accuracy of other NOE assignments in its neighborhood. Thus, an error in an incorrect NOE assignment can be *propagated*, and hence influence the assignments of other NOEs. In contrast to NOE restraints, RDCs provide *global* orientational restraints on internuclear vectors, for example, backbone NH and C^{α}–H^{α} (henceforth abbreviated to CH) bond vectors with respect to a global alignment frame (Tolman et al. 1995; Tjandra and Bax 1997). In addition, in solution NMR RDC data can be collected with high precision, and assigned much faster than NOEs. Although several attempts have been made to find self-consistent NOE assignments (Güntert 2003; Mumenthaler et al. 1997; Herrmann et al. 2002; Linge et al. 2003; Gronwald et al. 2002; Huang et al. 2006; Kuszewski et al. 2004), little work has been done on exploiting other constraints such as residual dipolar couplings (RDCs) for resolving ambiguous NOE assignments. noah (Mumenthaler et al. 1997; Güntert 2003), for example, uses the structure determination package dyana (Herrmann et al. 2002), and starts with an initial set of NOE assignments with putatively one or two possible assignments. aria (Linge et al. 2003) and candid (Herrmann et al. 2002) improved on noah by incorporating better modeling of ambiguous distance constraints. For instance, in both programs, the form of a (∑*r*^{−6})^{−1/6} sum is used for handling ambiguous distances when multiple possible assignments exist for an NOE crosspeak. In auto-structure (Huang et al. 2006), several heuristic rules that simulate the expertise of manual assignment are included to generate an initial fold. In pasd (Kuszewski et al. 2004) several strategies were proposed to reduce the chance of invoking the structure calculation into a biased path due to the incorrect initial global fold. None of the above NOE assignment approaches applied the global constraints from RDC data to filter ambiguous NOE assignments.

In traditional NMR structure determination approaches, stochastic techniques such as SA/MD are employed in a tight inner-loop and are invoked several times to filter ambiguous NOE assignments (Güntert 2003). The objective function used in these stochastic techniques models both the empirical molecular mechanics energy and the satisfaction of experimental data for a protein structure. Unfortunately, these stochastic techniques can be trapped into local minima in the energy landscape of the objective function. Missing the global minimum structure solution during the iterative process can subsequently lead to incorrect NOE assignments. Furthermore, most previous approaches for automated NOE assignment (Herrmann et al. 2002; Linge et al. 2003) heavily depend on an accurate initial fold. However, the acquisition of a sufficient number of initial NOE assignments for computing a reliable starting fold is non-trivial, mainly due to the chemical shift degeneracy. Manual intervention and human expertise are often required in assigning these important initial NOEs in order to obtain a reliable initial fold.

To address the above issues, a polynomial time de novo algorithm, called rdc-exact, has been proposed to compute high-resolution backbone structures for secondary structure elements (SSEs) using a minimum amount of residual dipolar coupling (RDC) data (Wang and Donald 2004a, b; Wang et al. 2006). The accurate backbone conformations computed by this algorithm enable us to propose a new strategy for NOE assignment. For example, a novel NOE assignment algorithm (Wang and Donald 2005) was proposed to filter ambiguous NOE assignments based on an ensemble of distance intervals computed using intra-residue vectors mined from a rotamer database, against inter-residue vectors from the backbone structure determined from RDCs. This algorithm uses a *triangle-like inequality* between the intra-residue and inter-residue vectors to prune incorrect assignments for side-chain NOEs. However, the algorithm (Wang and Donald 2005) has the following deficiencies: (a) it does not exploit the diversity of the rotamers in the library, (b) it does not exploit the rotamer patterns observable in NOE spectra, and (c) uncertainty in NOE peak position suggests a probabilistic model with provable properties which the previous algorithm (Wang and Donald 2005) did not capture.

In this paper, we propose a new structure determination framework, called rdc-Panda (*RDC*-based SSE *PA*cking with *N*OEs for Structure *D*etermination and NOE *A*ssignment), by starting from an accurate global fold computed from RDCs. rdc-Panda consists of three modules: (1) rdc-exact, which computes orientations and conformations of SSE backbones; (2) Packer, which packs SSE backbones using sparse NOE restraints; and (3) hana (*HA*usdorff-based *N*OE *A*ssignment), which uses the SSE backbones to place side-chains and assign NOEs. Unlike previous approaches that randomly sample the conformation space to find solutions that satisfy experimental restraints (Tian et al. 2001; Hus et al. 2001; Andrec et al. 2004), rdc-exact solves a system of low-degree polynomial equations in closed-form formulated from RDC restraints and protein backbone kinematics, to compute the backbone dihedral angle solutions. rdc-exact employs a systematic search over the roots of the polynomial system to find the global optimal solution for each secondary structure element. Packer first extracts a small number of unambiguous inter-SSE NOEs from the NOESY spectra using only chemical shift information, and then performs a systematic 3-dimensional (3D) grid search over relative translations for the core structures, including all SSE backbone conformations. It considers all possible rotamers and discrete translation points that satisfy these sparse inter-SSE NOEs. The hana module uses a Hausdorff-based scoring function to measure the similarity between the experimental spectra and the back-computed NOE pattern for each rotamer from a statistically-diverse rotamer library (Lovell et al. 2000), and selects optimal position-specific rotamers for filtering ambiguous NOE assignments. Finally, a local minimization approach is used to compute loops, refine side-chain conformations, and eliminate steric clashes among side-chains. hana views the NOE assignment process as a *pattern-recognition* problem, where the objective is to establish a match by explicitly modeling the uncertainty in NOE peak positions, and thereby to choose an ensemble of rotamers with the best match scores between the experimental NOE data and the back-computed NOE pattern. Unlike previous, stochastic algorithms (Güntert 2003; Herrmann et al. 2002; Huang et al. 2006; Linge et al. 2003; Mumenthaler et al. 1997; Kuszewski et al. 2004) for NOE assignment, hana uses the reliable initial fold computed primarily from RDCs, and hence can effectively prune ambiguous NOE assignments. Our strategy for computing the global fold is similar to the hierarchical approaches in (Hayes-Roth et al. 1986; Chen et al. 1998; Delaglio et al. 2000), which apply the “local-to-global” idea and start with the SSEs to construct the global fold. In these approaches, however, SSEs are either determined by sampling the conformation space to find solutions satisfying the distance restraints (mainly from assigned NOEs) (Hayes-Roth et al. 1986; Chen et al. 1998), or selected from a structure database (Delaglio et al. 2000), while in rdc-Panda the global fold is defined from exact solutions to the RDC equations.

We applied rdc-Panda to four proteins: the FF Domain 2 of human transcription elongation factor CA150 (RNA polymerase II C-terminal domain interacting protein) (FF2), human ubiquitin, the ubiquitin-binding zinc finger domain of the human Y-family DNA polymerase Eta (pol η UBZ), and the human Set2-Rpb1 interacting domain (hSRI). Our results show that rdc-Panda can achieve an accuracy of more than 90% for NOE assignment, and can calculate an ensemble of structures with backbone RMSD of 0.97 ± 0.30 Å and all-heavy-atom RMSD of 1.74 ± 0.36 Å from the reference structures. These results show that rdc-Panda can be successfully applied for high-resolution protein structure determination using only a limited set of NMR data by first computing RDC-defined backbones.

## Methods

### Overview

*core structure*. Next, in the hana module (Fig. 1C), side-chains are placed on the core structure from the rotamer library (Lovell et al. 2000) by comparing back-computed NOE patterns of rotamers versus the experimental NOESY spectra using a Hausdorff-based pattern matching technique (Zeng et al. 2008). Then the algorithm iteratively computes the NOE assignments and structures in loop regions (including side-chains) using a

*local minimization*approach (Fig. 1D), in which the core structure is fixed as a rigid body and only the loop regions and side-chains are allowed to move. Details of the local minimization approach are provided in SM Section S7. The local minimization approach allows side-chain flexibility and thus can eliminate the steric clash between rotameric side-chains that could not be resolved by the NOE pattern matching technique. The complete structure (including side-chains) computed by hana, called the

*low-resolution structure*, is then used to filter ambiguous NOE assignments. In Stage 2, the NOE assignment table computed by hana in Stage 1 is fed to the structure refinement protocol in xplor/xplor-nih to calculate high-resolution structures.

Unlike previous approaches, in which RDCs are only used in final structure refinement, our structure determination applies the global constraints from RDCs at the beginning, to compute an initial accurate global fold for subsequently pruning ambiguous NOE assignments. Moreover, our packing algorithm systematically searches all discrete translation points and outputs all packed structures that satisfy the experimental restraints. In addition, the pattern matching technique drives the selection of optimal rotamers for NOE assignment by statistically exploiting the conformational diversity of the rotamer library, effectively utilizing the accurate backbone conformations solved from RDCs, and efficiently mining the implicit rotamer patterns in the experimental NOE spectra.

### Input data

The input data to rdc-Panda include: (1) the primary sequence of the protein; (2) the 3D NOE peak list from both ^{15}N- and ^{13}C-edited spectra; (3) the resonance assignment list, including both backbone and side-chain resonance assignments; (4) the RDC data, including CH and NH RDCs, and the RDCs of other bond vectors (optional), such as C^{α}–C′ and N–C′ bond vectors; (5) the talos table of dihedral angle ranges from the chemical shift analysis (Cornilescu et al. 1999); (6) the rotamer library (Lovell et al. 2000). Only CH and NH RDCs in one medium are required by rdc-Panda to compute the backbone conformation and orientation, but other additional RDCs such as C^{α}–C′ and N–C′ RDCs can be also included. This additional data is used to prune the (ϕ, ψ) angles in the rdc-exact step.

### SSE backbone determination from residual dipolar couplings

Residual dipolar couplings (Tjandra and Bax 1997; Tolman et al. 1995) provide global orientational restraints on the internuclear bond vectors, such as NH and CH bond vectors, with respect to a global coordinate frame. Given NH RDCs in two aligning media, the associated NH vector must lie on the intersection of two conic curves (Skrynnikov and Kay 2000; Wedemeyer et al. 2002; Wang and Donald 2004b). In Wang et al. (2006), Wang and Donald (2004a, b), the authors proposed the first polynomial-time de novo algorithm, which we henceforth refer to as rdc-exact, to compute high-resolution protein backbone structures from RDC data. rdc-exact takes as input two RDCs per residue (e.g., assigned NH RDCs in two media or NH and CH RDCs in a single medium). In Wang and Donald (2004a, b), the authors also showed that given a peptide plane, the orientation of the next peptide plane can have at most 16 possible orientations; a related theorem was shown by (Hus et al. 2008). Given NH and CH RDCs in one medium, the associated NH and CH vectors can be solved from the RDC curve equations (see SM Section S2) and the protein backbone geometry (Wang and Donald 2004a; Wang et al. 2006). For the one-medium case, detailed proofs can be found in Wang and Donald (2004a, b), Wang et al. (2006), and for completeness we provide a somewhat simpler derivation in SM Section S2. The derivation closely mirrors our new (open-source) software implementation, and the clearer equations therein are easier to interpret and build upon. For a detailed review of this method see (Donald and Martin 2008).

rdc-exact does not randomly search the conformation space to find solutions consistent with the RDC data. Rather, it formulates the problem such that the structures computed are *exact solutions* of a system of quartic monomial equations derived from the RDC equation. Hence, these roots, and therefore the conformations, are discrete, finite and algebraic. All dihedral angles for each residue are solved exactly from the quartic RDC equations. A depth-first systematic search algorithm is applied to search over all possible roots (conformations) to find an optimal solution for a SSE. The scoring function used in the depth-first systematic search contains RDC RMSD (namely the sum of the squared differences between experimental and back-computed RDCs over all RDCs in the SSE), Ramachandran suitability, van der Waals packing, and other common empirical molecular mechanics energy terms (Wang and Donald 2004b, Wang et al. 2006). As previously described in those references, the computed dihedral angles for a residue are not solely dependent on the local RDC information at that residue. Rather, the algorithm searches over all SSE residues and finds the global minimum of the scoring function for each SSE.

Before applying the rdc-exact algorithm to solve backbones for the core structure, we first identify the SSE boundaries based on the dihedral angle ranges computed from talos (Cornilescu et al. 1999) based on chemical shift information. When the talos dihedral intervals for a residue are within the favored or allowed Ramachandran area of α-helix or β-strand, we consider this residue as part of that SSE. We note that other experimental data such as the J-coupling data from the NMR experiment HNHA (Vuister and Bax 1993) or the NOE patterns from automated assignment (Bailey-Kellogg et al. 2000) of SSEs can also be used to determine the SSE boundaries.

We have extended the rdc-exact algorithm (Wang and Donald 2004b, Wang et al. 2006) to incorporate C^{α}C′ and NC′ RDCs with CH and NH RDCs for the backbone calculation. The alignment tensor is estimated from NH and CH RDCs using the same approach as in (Wang and Donald 2004b,Wang et al. 2006). The C^{α}C′ and NC′ RDCs are excluded in the alignment tensor computation, because they are in general noisier than NH and CH RDCs. In the new version of rdc-exact, NH and CH RDCs are first applied to compute and enumerate conformations as solutions to the polynomial systems derived from the RDC equations. Then the remaining C^{α}C′ and NC′ RDCs are used to prune solutions for which the RMSD between back-computed and experimental RDCs is larger than thresholds 5.0 Hz (after being scaled to NH RDC). In addition, the outlier (ϕ, ψ) angle solutions are pruned by intersecting the talos ranges with the Ramachandran regions. At this stage, every residue (except glycine and proline) in the backbone structure is replaced with alanine. Such a scheme is called *alanine replacement*. A *serious steric clash* is defined between atoms *i* and *j* when the distance between them satisfies *d*_{ij} < (*r*_{i} + *r*_{j}) − ɛ, where *r*_{i} and *r*_{j} are van der Waals radii, and ɛ is the overlap threshold between two atoms. Currently we choose ɛ = 0.5 Å for the steric clash checking. We prune those (ϕ, ψ) solutions that result in serious steric clash after alanine replacement, and ensure that no serious clashes occur in our computed backbones. Later (below), we will replace the alanines with proper side-chain rotamers.

### SSE packing from sparse NOE restraints

Once the conformations and orientations of individual SSE backbones have been solved by rdc-exact, their translations can be determined using a small number of inter-SSE NOE distance restraints (Fowler et al. 2000, Wang and Donald 2004b). Although the NOE restraint provides only an interval bound on the distance between atoms, in general a small number of NOE distance restraints can confine the translation into a bounded conformation space in which all discretized solutions within the parameterized resolution can be enumerated using a systematic grid search.

When packing a pair of SSEs solved from RDCs, we must consider the fourfold orientational ambiguity. Although the orientational ambiguity certainly can be resolved given a sufficient number of independent media, it cannot be resolved based only on RDCs in a single medium. Suppose that we pack all SSEs sequentially, that is, we first assemble the first two SSEs, and then pack their combination with the third SSE, and so on, as previously described (Wang and Donald 2004b). Before packing each pair of SSEs, sparse inter-SSE NOE restraints were extracted using chemical shift information alone (details are provided in SM Section S3). Using these sparse NOEs plus a van der Waals packing score, we can prune the orientational ambiguity (Georgiev et al. 2008, Potluri et al. 2006).

Since most inter-SSE NOEs involve side-chain interactions, we must consider all of the possible side-chain rotamer conformations when using these unambiguous NOE restraints in packing each pair of the SSE backbones. Let *H*_{1} and *H*_{2} denote two SSE backbones to be packed together. Let *D* be the set of inter-SSE NOE restraints between *H*_{1} and *H*_{2}, where for \(d_i = (h_{i1}, h_{i2},\ell_i,u_i) \in D, h_{i1}, h_{i2}\) are the NOE-interaction protons in *H*_{1} and *H*_{2} respectively, and \({\ell_i,}\;u_i\) are the lower and upper bounds of the NOE. Our goal is to find all possible translations \({\bf t} \in {\mathbb{R}}^3\) between *H*_{1} and *H*_{2}, such that there exists a pair of rotamers in *H*_{1} and *H*_{2} in which the distance between the corresponding protons *h*_{i1} and *h*_{i2}, denoted by *d*_{i}, satisfies the NOE restraint, namely \(\ell_i \leq d_i \leq u_i.\)

*H*

_{1}and

*H*

_{2}, denoted by

*a*

_{0}and

*b*

_{0}, as the representative points for

*H*

_{1}and

*H*

_{2}respectively. Then the translation

**t**between

*H*

_{1}and

*H*

_{2}can be represented by the translation between points

*a*

_{0}and

*b*

_{0}, namely

**t**=

*a*

_{0}−

*b*

_{0}. Let

*a*be a proton of a residue in a particular rotamer state in

*H*

_{1}and

*b*be a proton of a residue in a particular rotamer state in

*H*

_{2}. Suppose that an NOE \((a,b,\ell_i,u_i)\) gives the distance restraint between proton

*a*and

*b*. Then we have

**t**

_{a}be the vector from

*a*

_{0}to

*a*, namely

*a*=

**t**

_{a}+

*a*

_{0}. Let

**t**

_{b}be the vector from

*b*

_{0}to

*b*, namely

*b*=

**t**

_{b}+

*b*

_{0}. Substituting into Eq. 1, we have

**t**to a spherical shell with center at

**t**

_{b}−

**t**

_{a}and radii of \(\ell_i\) and

*u*

_{i}. Let

*q*be the number of inter-SSE NOE restraints. Given the

*i*th NOE restraint, let λ

_{i}be the total number of rotamer pairs between two corresponding residues. Let

*A*

_{ij}denote the spherical shell that represents the

*i*th NOE restraint given the

*j*th pair of rotamers at corresponding residues, where 1 ≤

*i*≤

*q*and 1 ≤

*j*≤ λ

_{i}. Then the complete space of translation between

*H*

_{1}and

*H*

_{2}that satisfies all inter-SSE NOE restraints is represented by

*q*is the total number of NOE restraints, and λ

_{i}is the total number of rotamer pairs between two corresponding residues.

- (1)
When packing each pair of SSE backbones, consider all four-fold symmetries of SSE orientations due to the symmetry of the dipolar operator reflected in the RDCs.

- (2)
Apply a 3D grid search over relative translations \({\bf t} \in {\mathbb{R}}^3\) with a resolution of 0.2 Å to find all discrete translation points in which the set of solutions in Eq. 3 is not empty.

- (3)
Prune those packed structures containing steric clashes between atoms from difference SSE fragments.

- (4)
Cluster all packed structures from Step (2) using an agglomerative hierarchical clustering algorithm (Han and Kamber 2006), in which two packed structures are allowed to be in a cluster only if their backbone RMSD is within 0.4 Å. The centroids of all clusters form the final set of representative packed structures.

Our packing algorithm finds all of the discrete translation solutions within a parameterized resolution (i.e. 0.4 Å) that satisfy SSE orientations determined from RDCs and all inter-SSE NOE restraints. The time complexity analysis for Packer is given in SM Section S4. In practice, Packer runs in 30–60 minutes on a 3 GHz single-processor Linux workstation.

After we obtained the set of packed SSEs, which are called the *initial* packed structures, we computed a subset of *well-packed satisfying* (WPS) structures that had both high-quality van der Waals (vdW) score and good NOE satisfaction score (Potluri et al. 2006, 2007). We used a 6-12 Lennard-Jones potential to compute the packing score between SSEs, and a square-well potential (Brünger 1992) to calculate the NOE satisfaction score.

### NOE assignment using a pattern matching technique

The NOE assignment process can be divided into three phases, viz. initial NOE assignment (phase 1), rotamer selection (phase 2), and filtration of ambiguous NOE assignments (phase 3). The initial NOE assignment (phase 1) is done by considering all pairs of protons that are possibly assigned to an NOE cross peak if the resonances of corresponding atoms fall within a tolerance window around the NOE peak. We use error windows of 0.4 ppm for heavy atoms (^{15}N and ^{13}C), and 0.04 ppm for protons in the NOE assignment. In the rotamer selection phase (phase 2), we place all the side-chain rotamers of each residue into the backbone and compute all expected NOEs for protons within 6 Å apart. Based on the set of the expected NOEs and the resonance assignment list, we back-compute the expected NOE peak pattern for each rotamer. By matching the back-computed NOE pattern with the experimental NOE spectrum using an extended model of the Hausdorff distance (Huttenlocher and Jaquith 1995), we measure how well a rotamer fits the actual side-chain conformation when interpreted in terms of the NOE data (Fig. 1C). We then select the top *k* rotamers with highest fitness scores at each residue, and obtain a “low-resolution” structure, which typically has approximately 2.0–3.0 Å (all heavy atom) RMSD from the reference structures solved by X-ray or traditional NMR approaches. The low-resolution structure is then used (in phase 3) to filter ambiguous NOE assignments. The details of the NOE assignment algorithm are provided in SM Section S5.

### NOE pattern matching based on the Hausdorff distance measure

The Hausdorff distance measures the closeness between two sets of points by computing the distance from the point in one set that is farthest from any point in the other set, and vice versa. Two sets of points have a small Hausdorff distance if every point in one set is close to some point in the other set. Thus, the Hausdorff distance is suitable for determining the degree of resemblance between two sets of points when they are superimposed on each other. The Hausdorff distance has been widely used in image processing and computer vision, e.g., visual correspondence, pattern recognition, and shape matching (Huttenlocher and Kedem 1992). Unlike many other pattern-recognition algorithms, Hausdorff-based algorithms are combinatorially precise, and provide a robust method for measuring the similarity between two point sets or image patterns (Huttenlocher and Kedem 1992) in the presence of noise and positional uncertainties. In the NOE assignment problem, let *B* denote a back-computed NOE pattern, i.e., the set of back-computed NOE peaks, and let *Y* denote the set of experimental NOE peaks. The Hausdorff distance between *B* and *Y* can be measured by *H*(*B*,*Y*) = max(*h*(*B*,*Y*), *h*(*Y*,*B*)), where \(h(B,Y)=\hbox{max}_{b \in B} \hbox{min}_{y {\in}Y} \|b-y\|.\) The operations of nested Min and Max in the Hausdorff definition compute the point \(b\in B\) that is farthest from any point in *Y*, and measures the distance from point *b* to its closest neighbor *y* in *Y*. Thus, the Hausdorff distance *H*(*B*,*Y*) describes the discrepancy between the configurations of the two point sets *B* and *Y*, since it actually measures the the distance from a point in *B* that is farthest from any point in *Y*, and vice versa. A review article on the Hausdorff distance can be found in (Huttenlocher and Kedem 1992).

*B*with

*m*peaks, and the set of NOE peaks

*Y*with

*w*peaks, the τ

*-th Hausdorff distance*from

*B*to

*Y*is defined as

*m*values. We call

*s*= τ/

*m*the

*similarity score*between the back-computed NOE pattern

*B*and the experimental peak set

*Y*, after fixing the Hausdorff distance

*h*

_{τ}(

*B*,

*Y*) = δ, which is the error in chemical shift. The similarity score for a rotamer given δ can be computed using a scheme similar to that of (Huttenlocher and Jaquith 1995):

*Y*

_{δ}denotes the union of all balls obtained by replacing each point in

*Y*with a ball of radius δ,

*B*∩

*Y*

_{δ}denotes the intersection of sets

*B*and

*Y*

_{δ}, and \(|\cdot|\) denotes the size of a set.

*a*

_{1},

*a*

_{2},

*a*

_{3},

*d*) represent a

*distance restraint*back-computed from a structure, where

*a*

_{1}and

*a*

_{3}are the involved protons in the structure,

*a*

_{2}is the heavy atom covalently bound to the proton

*a*

_{1}, and

*d*is the distance between protons

*a*

_{1}and

*a*

_{3}. Let (

*p*

_{1},

*p*

_{2},

*p*

_{3},

*I*

_{p}) denote an

*experimental NOE peak*from a 3D NOESY spectrum, where

*p*

_{1}and

*p*

_{3}are frequencies of a pair of (unassigned) interacting protons,

*p*

_{2}is the frequency of the heavy atom covalently bound to the first proton, and

*I*

_{p}is the intensity of the cross peak. Let

*b*

_{i}= (ω(

*a*

_{1}), ω(

*a*

_{2}), ω(

*a*

_{3}),

*I*(

*d*)) denote the

*back-computed NOE peak*for a distance restraint (

*a*

_{1},

*a*

_{2},

*a*

_{3},

*d*) back-computed from a structure, where ω(

*a*

_{j}) is the assigned chemical shift of atom

*a*

_{j}, 1 ≤

*j*≤ 3, and

*I*(

*d*) is the back-computed peak intensity of distance

*d*. The equation

*I*(

*d*) =

*k*

*d*

^{−6}is used to back-compute the peak intensity

*I*(

*d*) from the distance

*d*, where the constant

*k*is calculated using the same strategy as in (Mumenthaler et al. 1997). The

*likelihood*for a back-computed peak

*b*

_{i}= (ω(

*a*

_{1}), ω(

*a*

_{2}), ω(

*a*

_{3}),

*I*(

*d*)) in the NOE pattern

*B*to match an experimental NOE peak within the distance δ in

*Y*

_{δ}can be defined as

*p*

_{1},

*p*

_{2},

*p*

_{3},

*I*

_{p}) is the experimental NOE peak matched to the back-computed NOE peak (ω(

*a*

_{1}), ω(

*a*

_{2}), ω(

*a*

_{3}),

*I*(

*d*)) under the Hausdorff distance measure, and \({\mathcal{N}}(|x-\mu|,\sigma)\) is the probability of observing the difference |

*x*−μ| in a normal distribution with mean μ and standard deviation σ. We note that the normal distribution and other similar distribution families have been widely used to model the noise in the NMR data, e.g., see (Rieping et al. 2005) and (Langmead and Donald 2004).

When back-computing the NOE pattern for each rotamer, hana also considers the stereospecific assignment ambiguity for H^{α} in glycine, all β-methylene protons, and methyl groups in leucine and valine. hana back-computes all possible NOE patterns resulting from the different possible stereospecific assignments for all protons in a residue, and chooses the stereospecific assignment with the best match score for each rotamer when compared versus the experimental data.

We provide the detailed pseudocode for computing the similarity score and for hana in SM Section S5. The time complexity analysis (see SM Section S6) shows that hana runs in polynomial time. In practice, hana runs in 1–2 minutes on a 3 GHz single-processor Linux workstation.

### Incorporation of rotamer probabilities into the scoring function

*X*

_{j}be the boolean proposition that the

*j*-th rotamer is selected. Let Pr(

*D*|

*X*

_{j}) be the likelihood function that quantifies the likelihood of observing data

*D*given the

*j*-th rotamer. Then Pr(

*D*|

*X*

_{j}) is equivalent to the similarity score in Eq. 5, that is,

*X*

_{j}|

*D*) is given by

## Results

We tested rdc-Panda on four proteins: the FF Domain 2 of human transcription elongation factor CA150 (RNA polymerase II C-terminal domain interacting protein) (FF2), the ubiquitin-binding zinc finger domain of the human Y-family DNA polymerase Eta (pol η UBZ), human ubiquitin, and the human Set2-Rpb1 interacting domain (hSRI). The lengths (number of amino acid residues) of these proteins are 62 for FF2, 39 for pol η UBZ, 76 for ubiquitin, and 112 for hSRI.

All NMR data except the RDC data of ubiquitin were recorded and collected using Varian 600 and 800 MHz spectrometers at Duke University. The NMR spectra were processed using the program NMRPipe (Delaglio et al. 1995). All NMR peaks were picked by the programs NMRView (Johnson and Blevins 1994) or Xeasy/Cara (Bartels et al. 1995), followed by manual editing. Backbone assignments were obtained from the set of triple NMR experiments HNCA, HN(CO)CA, HN(CA)CB, HN(COCA)CB, and HNCO, combined with the HSQC spectra using the program Paces (Coggins and Zhou 2003), followed by manual checking. The side-chain resonances were assigned from the HA(CA)NH, HA(CACO), HCCH-TOCSY, and HC(CCO)NH-TOCSY spectra. The NOE cross peaks were picked from three-dimensional ^{15}N- and ^{13}C-edited NOESY-HSQC spectra. In addition, we removed the diagonal cross peaks and water artifacts from the picked NOE peak list. The NH and CH RDC data of FF2, pol η UBZ and hSRI were measured from a 2D ^{1}H–^{15}N IPAP experiment (Ottiger et al. 1998) and a modified (HACACO)NH experiment (Ball et al. 2006) respectively. The C^{α}C′ and NC′ RDC data of FF2 were measured from a set of HNCO-based experiments (Permi et al. 2000). The CH and NH RDC data of ubiquitin were obtained from the Protein Data Bank (PDB ID: 1D3Z).

The solution structures of FF2, pol η UBZ, hSRI and ubiquitin have been solved using conventional NMR structure determination approaches (PDB ID of ubiquitin NMR reference structure: 1D3Z; PDB ID of FF2 NMR reference structure: 2E71; PDB ID of pol η UBZ NMR reference structure: 2I5O; PDB ID of hSRI NMR reference structure: 2A7O). In addition, the X-ray structure of human ubiquitin (PDB ID: 1UBQ) was available. We used these previously-solved NMR or X-ray structures as the reference structures for evaluating the structure calculation results from rdc-Panda.

### Evaluation of SSE backbones determined from RDCs

Results on SSE backbones computed by rdc-exact

Proteins | Backbone RMSD (Å) versus NMR structure | Backbone RMSD (Å) versus X-ray structure | RDC RMSD (Hz) |
---|---|---|---|

Ubiquitin | α | α | CH: 1.05; |

β | β | NH: 1.42. | |

β | β | ||

β | β | ||

β | β | ||

β | β | ||

hSRI | α | N/A | CH: 1.31; |

α | NH: 1.01. | ||

α | |||

pol η UBZ | α | N/A | CH: 1.98; |

β | NH: 2.27; | ||

β | |||

FF2 | α | N/A | CH: 0.31; |

α | NH: 0.23; | ||

α | C | ||

NC′: 2.65. |

^{α}C′ and NC′ RDCs, likely because C

^{α}C′ and NC′ RDCs contain more noise than CH and NH RDCs. Most back-computed RDCs are within deviation 0–2.0 Hz from the experimental RDCs. The maximum deviations (about 3–6 Hz) between back-computed and experimental RDCs occur in the short β-strands with length less than 3 residues, including residues K48-L50 in ubiquitin, and residues Q9-P11 and L18-P20 in pol η UBZ, which indicates that rdc-exact has difficulty in computing accurate orientations for these short structure fragments. When only a small number of RDCs are available for the short fragments, it is difficult for rdc-exact to calculate the orientations of fragments in the principal order frame (POF) accurately from the RDC equations. Moreover, short α-helices or β-strands often contain proline as their terminal residues and hence do not contain the observed NH RDC in the proline residue. In addition, when a modified (HACACO)NH experiment (Ball et al. 2006) is used to measure RDCs, the CH RDC in the residue proceeding proline is also missing. This situation makes it more difficult to solve the backbone conformation. For example, in pol η UBZ only two NH RDCs and two CH RDCs are available for each β-strand (namely residues Q9-P11 and L18-P20) (since each strand contains one proline), which make it harder to calculate the accurate orientations and conformations for the β-sheet in pol η UBZ. The paucity of NH RDCs in the short β-strands might explain why the overall RDC RMSD of pol η UBZ is slightly larger than other three proteins (Table 1). However, as we will demonstrate below, such structure deviations in the short β-strands are manageable and did not significantly degrade our NOE assignment and structure calculation results.

### Quality of SSE packing

Summary of SSE packing results (computed by Packer)

Proteins | Number of unambiguous NOEs used for packing | Total number of packed structures | Total number of WPS structures | Average backbone RMSD to mean structure (all, WPS) (Å) | Backbone RMSD between mean and reference structure (all, WPS) (Å) | Percentage of compatible SSE NOE assignments (%) |
---|---|---|---|---|---|---|

Ubiquitin | 3 | 1658 | 117 | 1.21, 0.69 | To NMR structure: 0.96, 0.86 | 91.1 |

To X-ray structure: 1.00, 0.87 | ||||||

hSRI | 12 | 2386 | 110 | 1.66, 1.32 | To NMR structure: 1.84, 1.79 | 90.5 |

pol η UBZ | 7 | 1645 | 54 | 1.63, 0.79 | To NMR structure: 1.12, 0.97 | 93.6 |

FF2 | 9 | 1472 | 139 | 1.36, 1.08 | To NMR structure: 1.50, 1.31 | 91.1 |

The ensemble of the initial packed structures have average backbone RMSD 1.2–1.7 Å to the mean structure, while the ensemble of the WPS structures have average backbone RMSD 0.6–1.4 Å to the mean structure (Table 2). The mean structure has backbone RMSD 0.9–1.9 Å for the initial packed structures and backbone RMSD 0.8–1.8 Å for WPS structures to the reference structure, which indicates that the computation of WPS structures yields an ensemble of packed structures closer to the reference structure by improving the RMSD between the computed structures and the reference structure. Moreover, the selection of WPS structures increases the percentage of the structures that are within RMSD 2.0 Å to the reference structure in the chosen ensemble (Fig. 3 and SM Fig. S3, columns 2–3).

To check the NOE assignment performance resulting from the packed structures, we investigated the percentage of the SSE NOE assignments computed by hana that agreed with the reference structure. We say an NOE assignment is *compatible* if it is consistent with the reference structure (namely the distance between the assigned pair of NOE protons in the reference structure satisfies the NOE restraint calibrated from peak intensity), otherwise we call it an *incompatible* NOE assignment. Then the percentage of compatible SSE NOE assignments is defined as the fraction of SSE NOE assignments (over all SSE NOE assignments) computed by hana. As shown in Table 2, hana computes more than 90% compatible NOE assignments for SSE regions, after placing the side-chains onto the packed backbones. Together these results show that our packing algorithm can assemble SSE backbones and compute sufficiently accurate core structures (within backbone RMSD 0.8–1.8 Å to the reference structure) such that they can then be effectively used for filtering ambiguous NOE assignments.

### Evaluation of rotamers computed by hana

To assess the accuracy of rotamers selection from hana, we studied the details of computed rotamers in hana’s low-resolution structure of ubiquitin. Our test on ubiquitin shows that hana selects more than 70% of rotamers that are consistent with either the X-ray or NMR reference structure (SM Fig. S4, Fig. S5 and Fig. S7).

Further tests show that hana computes all accurate leucine rotamers that are consistent with either X-ray or NMR reference structures (SM Fig. S6). In addition, the number of consistent rotamers does not vary significantly with the backbone resolution (the variance is less than 10% of the consistent rotamers), which indicates that our rotamer selection algorithm is not sensitive to small variations in the backbone conformation (SM Fig. S7).

We classified all residues into two categories: *class I* residues, including valine, isoleucine, leucine, methionine, phenylalanine, tryptophan and cysteine; and *class II* residues, including remaining residues except alanine and glycine. Note that class I residues are generally more hydrophobic than class II residues. Alanine and glycine are excluded since they have only one conformation. Our investigation shows that hana chose 90.0% correct rotamers (i.e., consistent with either the X-ray or NMR reference structure) for class I residues, and 65.1% correct rotamers for class II residues. The reason that our algorithm finds fewer correct rotamers for class II residues than for class I residues is because more class II residues are located on the protein surface. A further investigation shows that about three quarters of class II residues in ubiquitin exhibit solvent-accessibility over 20%, while more than 90% of class I residues are buried inside the protein with solvent-accessibility below 20%. It can be easily rationalized that more NOEs are generally observed for class I residues buried in the core of the protein than for those class II residues on the surface. Thus, class II residues generally have fewer observable NOE restraints to constrain the correct rotamers than class I residues do. On the other hand, as we will demonstrate in the next subsection, the wrong NOE assignments caused by these incorrect rotamers in class II residues are manageable and do not degrade the quality of our final structure ensemble.

### Evaluation of calculated structures

The final NOE assignment table computed by hana was fed into the structure calculation software xplor/xplor-nih together with other experimental restraints including dihedral angle and hydrogen bond distance constraints. We used the same hydrogen bond and dihedral angle constraints as in computing the NMR reference structures, hence our structure calculation test should be fair to demonstrate the accuracy of our NOE assignment results. Compared with the calculation of the NMR reference structures, in which RDCs were incorporated along with NOE restraints only during the final structure calculation, here we only used RDCs to compute the initial backbone fold. From an algorithmic point of view, our final structure determination using NOEs but not RDCs can be considered a good control test of the quality of the NOE assignment. The structure calculation was performed in two rounds. For FF2, the atomic coordinates of the ensemble of top 20 structures with the lowest energies, using the NOE assignments computed by our algorithm, have been deposited into the Protein Data Bank (PDB ID: 2KIQ).

Results on rdc-Panda’s NOE assignments and structures calculated from those assignments using xplor

| Ubiquitin | hSRI | pol η UBZ | FF2 |
---|---|---|---|---|

A: Summary of NOE assignments computed by rdc-Panda | ||||

1. NOE restraints computed by rdc-Panda | 1331 | 3327 | 960 | 1070 |

Intraresidue | 570 | 1193 | 429 | 422 |

Sequential (| | 353 | 612 | 244 | 243 |

Medium-range (| | 167 | 832 | 188 | 259 |

Long-range (| | 242 | 690 | 99 | 146 |

B: Summary of structures calculated from rdc-Panda’s NOE assignments using xplor | ||||

2. NOE violations (> 0.5 Å) | 0 | 0 | 0 | 0 |

3. Other experimental restraints | ||||

Hydrogen bonds | 27 | 80 | 30 | 46 |

Dihedral angle restraints | 61 | 178 | 74 | 96 |

4. Average RMSD to mean coordinates | ||||

SSE region (backbone, heavy) (Å) | 0.28, 0.69 | 0.39, 0.85 | 0.29, 0.58 | 0.53, 1.11 |

Ordered region (backbone, heavy) (Å) | 0.36, 0.73 | 0.44, 0.86 | 0.31, 0.61 | 0.63, 1.24 |

5. Ramachandran plot | ||||

Favored (%) | 90.5 | 92.3 | 94.2 | 90.8 |

Allowed (%) | 100 | 100 | 100 | 100 |

6. Deviation from idealized geometry | ||||

Bonds (Å) | 0.0013 ± 0.0001 | 0.0013 ± 0.0003 | 0.0014 ± 0.0002 | 0.0009 ± 0.0002 |

Angles (°) | 0.41 ± 0.01 | 0.48 ± 0.03 | 0.51 ± 0.02 | 0.36 ± 0.02 |

Impropers (°) | 0.22 ± 0.02 | 0.35 ± 0.02 | 0.35 ± 0.02 | 0.22 ± 0.01 |

7. RMSD to X-ray structure | ||||

SSE region (backbone, heavy) (Å) | 0.82, 1.49 | N/A | N/A | N/A |

Ordered region (backbone, heavy) (Å) | 0.96, 1.54 | N/A | N/A | N/A |

8. RMSD to NMR reference structure | ||||

SSE region (backbone, heavy) (Å) | 0.75, 1.35 | 1.09, 2.09 | 0.47, 1.22 | 0.76, 1.57 |

Ordered region (backbone, heavy) (Å) | 0.85, 1.38 | 1.21, 2.09 | 0.54, 1.50 | 1.26, 1.97 |

## Discussion

We have compared our Hausdorff-based algorithm with other metrics, such as a Bayesian metric (i.e., essentially changing lines 9/10 of SM Algorithm 1 to multiplication) and an RMSD-based metric. In the Bayesian metric, when more experimental peaks are observed around a back-computed peak (within the error window), the likelihood of this back-computed NOE peak is weakened rather than been enhanced. In our Hausdorff-based metric, only the closest experimental peak within the error window is matched to the corresponding back-computed peak. It will therefore be more robust than the Bayesian metric, and less affected by noisy peaks. The RMSD-based metric can be biased when experimental peaks are missing (i.e., when no experimental peaks are observed within the error window of the back-computed NOE peak). This is because when experimental peaks are missing, the RMSD-based metric will find a closest experimental peak that incorrectly matches to the back-computed NOE peak. Tests on our ubiquitin data show that the Hausdorff-based measurement performs 6–12% better than the other two metrics, in terms of percentage of consistent rotamers and compatible NOE assignments. We believe that the Hausdorff-based metric is in general more robust to the noisy and missing peaks, which are common in the NMR data. Further discussions of the differences between our algorithm and previous approaches can be found SM Sections S2 and S5.

We note that recently a similar pattern-matching technique has been independently proposed in ascan (Fiorito et al. 2008) to compare the back-computed NOE pattern with the experimental NOE data for the side-chain resonance assignment. ascan (Fiorito et al. 2008) computes the initial fold from a subset of NOE assignments based on given backbone resonance assignments and a subset of highly confident side-chain resonance assignments, and then uses an iterative strategy to refine the side-chain assignment, NOE assignment, and structure calculation. Compared to ascan (Fiorito et al. 2008), our approach starts with an RDC-defined backbone and performs a systematic search for the rotamer selection, and thus is potentially a more robust approach for the structure determination. The Hausdorff-based pattern matching technique for NOE assignment, which we introduced in Zeng et al. (2008), also allows us to efficiently measure the similarity between the back-computed NOE patterns and the experimental spectra in the presence of noise and experimental uncertainty.

### Limitations and extensions

Our approach assumes that dynamics can be neglected, although it has shown in recent studies that modest dynamics averaging can be tolerated, albeit with reduced accuracy in the calculation of the bond vector orientations (Ruan et al. 2008). When order parameters *S*^{2} are measured for the same bond vectors as the RDCs (e.g. using relaxation experiments), we can neglect the dynamics within the time scale of the dynamics measurements. Thus, we can heuristically assume that when *S*^{2} is sufficiently uniform (i.e. the core of the protein is largely rigid), then the dynamic averaging due to *S* in the RDC measurement is safe to tolerate for the structure determination.

Although our current implementation of hana uses 3D NOE spectra, hana is general and can be easily extended to higher-dimensional (e.g., 4D) NOE data (Coggins and Zhou 2008). In addition, it would be interesting to extend the current version of hana for NOE assignment with missing resonance assignments. In principle, hana can be extended to assign the NOEs with a partially assigned resonance list, as long as the back-computed NOE patterns with missing peaks can still support the identification of the accurate rotamers.

Our structure determination starts with the high-resolution core structure computed from RDCs. The loop regions are computed by a local minimization approach, which does not incorporate the RDC data into the structure calculation. Thus, the loop regions are less accurate than the SSE regions in our final structures (see Table 3 and Fig. 6). Currently we are developing efficient algorithms for computing the loop conformations that satisfy both NOE and RDC data.

## Conclusion

We have developed a novel algorithm that exploits the global RDC restraints for filtering ambiguous NOE assignments. We provided a novel approach for computing the initial structure template for NOE assignment by exactly solving backbones from RDCs and systematically choosing rotamers based on NOE pattern matching. Our automated structure calculation framework extends previous work (Wang and Donald 2004b, Wang et al. 2006) on backbone calculation from RDCs to high-resolution structure determination, and introduces a systematic packing algorithm that finds the ensemble of all packed structures and considers all possible side-chain conformations consistent with all inter-SSE NOE restraints. We presented a new automated NOE assignment algorithm that simultaneously exploits the accurate high-resolution backbone computation from RDC data (Wang and Donald 2004b, Wang et al. 2006), the statistical diversity of rotamers from a rotamer library (Lovell et al. 2000), and the robust Hausdorff measure (Huttenlocher and Jaquith 1995) for comparing the back-computed NOE patterns versus the experimental NOE spectra to choose accurate rotamers. Our algorithm computed the NOE assignments with high accuracy. Tests on NMR data for four proteins yielded accurate NOE assignment and structure calculation results, and suggest that rdc-Panda can play an important role in high-throughput structure determination.

## Availability

The rdc-Panda software is available by contacting the authors, and is distributed open-source under the GNU Lesser General Public License (Gnu, 2002).

## Notes

### Acknowledgements

We thank Dr. A. Yershova, Mr. J. Martin, Prof. J. Richardson, Prof. D. Richardson, Dr. S. Apaydin and all members of the Donald and Zhou Labs for helpful discussions and comments. We are grateful to Ms. M. Bomar for helping us with pol η UBZ NMR data. We thank Mr. D. Keedy, Prof. J. Richardson, Prof. D. Richardson and other members in the Richardson lab for helpful comments on our computed FF2 structures. This work is supported by the following Grants from National Institutes of Health: R01 GM-65982 to B. R. Donald and R01 GM-079376 to P. Zhou.

## Supplementary material

### References

- Andrec M, Du P, Levy RM (2004) Protein backbone structure determination using only residual dipolar couplings from one ordering medium. J Biomol NMR 21:335–347CrossRefGoogle Scholar
- Bailey-Kellogg C, Widge A, Kelley JJ, Berardi MJ, Bushweller JH, Donald BR (2000) The NOESY Jigsaw: automated protein secondary structure and main-chain assignment from sparse, unassigned NMR data. J Comput Biol 7(3–4):537–558CrossRefGoogle Scholar
- Ball G, Meenan N, Bromek K, Smith BO, Bella J, Uhrín D (2006) Measurement of one-bond
^{13}C^{α}-^{1}H^{α}residual dipolar coupling constants in proteins by selective manipulation of C^{α}H^{α}spins. J Magn Reson 180:127–136ADSGoogle Scholar - Bartels C, Xia T, Billeter M, Güntert P, Wüthrich K (1995) The program XEASY for computer-supported NMR spectral analysis of biological macromolecules. J Biomol NMR 6:1–10CrossRefGoogle Scholar
- Bomar MG, Pai M, Tzeng S, Li S, Zhou P (2007) Structure of the ubiquitin-binding zinc finger domain of human DNA Y-polymerase η. EMBO Rep 8:247–251CrossRefGoogle Scholar
- Brünger AT (1992) X-PLOR, Version 3.1: a system for X-ray crystallography and NMR. J Am Chem SocGoogle Scholar
- Chen CC, Singh JP, Altman RB (1998) The hierarchical organization of molecular structure computations. J Comput Biol 5:409–422CrossRefGoogle Scholar
- Coggins BE, Zhou P (2003) PACES: protein sequential assignment by computer-assisted exhaustive search. J Biomol NMR 26:93–111CrossRefGoogle Scholar
- Coggins BE, Zhou P (2008) High resolution 4-D spectroscopy with sparse concentric shell sampling and FFT-CLEAN. J Biomol NMR 42: 225–239CrossRefGoogle Scholar
- Cornilescu G, Marquardt JL, Ottiger M, Bax A (1998) Validation of protein structure from anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J Am Chem Soc 120:6836–6837CrossRefGoogle Scholar
- Cornilescu G, Delaglio F, Bax A (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13:289–302CrossRefGoogle Scholar
- Davis IW, Murray LW, Richardson JS, Richardson DC (2004) MolProbity: structure validation and all-atom contact analysis for nucleic acids and their complexes. Nucleic Acids Res 32:W615–W619CrossRefGoogle Scholar
- Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, Bax A (1995) NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J Biomol NMR 6:277–293CrossRefGoogle Scholar
- Delaglio F, Kontaxis G, Bax A (2000) Protein structure determination using molecular fragment replacement and NMR dipolar couplings. J Am Chem Soc 122:2142–2143CrossRefGoogle Scholar
- Donald BR, Martin J (2009) Automated NMR assignment and protein structure determination using sparse dipolar coupling constraints. Prog NMR Spectrosc 55(2):101–127CrossRefGoogle Scholar
- Fiorito F, Herrmann T, Damberger F, Wüthrich K (2008) Automated amino acid side-chain NMR assignment of proteins using (13)C- and (15)N-resolved 3D [(1)H, (1)H]-NOESY. J Biomol NMR 42:23–33CrossRefGoogle Scholar
- Fowler CA, Tian F, Al-Hashimi HM, Prestegard JH (2000) Rapid determination of protein folds using residual dipolar couplings. J Mol Biol 304:447–460CrossRefGoogle Scholar
- Georgiev I, Lilien RH, Donald BR (2008) The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J Comput Chem 29:1527–1542CrossRefGoogle Scholar
- Gronwald W, Moussa S, Elsner R, Jung A, Ganslmeier B, Trenner J, Kremer W, Neidig K-P, Kalbitzer HR (2002) Automated assignment of NOESY NMR spectra using a knowledge based method (KNOWNOE). J Biomol NMR 23:271–287CrossRefGoogle Scholar
- Güntert P (2003) Automated NMR protein structure determination. Prog Nucl Magn Reson Spectrosc 43:105–125CrossRefGoogle Scholar
- Han J, Kamber M (2006) Data mining: concepts and techniques. Morgan Kaufmann Publishers, San FranciscoGoogle Scholar
- Hayes-Roth B, Buchanan B, Lichtarge O, Hewtt M, Altman R, Brinkley J, Cornelius C, Duncan B, Jardetzky O (1986) PROTEAN: deriving protein structure from constraints. In: Proceedings of the fifth national conference on artificial intelligence, pp 904–908Google Scholar
- Herrmann T, Güntert P, Wüthrich K (2002) Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J Mol Biol 319(1):209–227CrossRefGoogle Scholar
- Huang YJ, Tejero R, Powers R, Montelione GT (2006) A topology-constrained distance network algorithm for protein structure determination from NOESY data. Proteins Struct Funct Bioinform 62(3):587–603CrossRefGoogle Scholar
- Hus JC, Marion D, Blackledge M (2001) Determination of protein backbone structure using only residual dipolar couplings. J Am Chem Soc 123:1541–1542CrossRefGoogle Scholar
- Hus JC, Salmon L, Bouvignies G, Lotze J, Blackledge M, Brüschweiler R (2008) 16-Fold Degeneracy of peptide plane orientations from residual dipolar couplings: analytical treatment and implications for protein structure determination. J Am Chem Soc 130:15927–15937CrossRefGoogle Scholar
- Huttenlocher DP, Jaquith EW (1995) Computing visual correspondence: incorporating the probability of a false match. In: Proceedings of the fifth international conference on computer vision (ICCV 95), pp 515–522Google Scholar
- Huttenlocher DP, Kedem K (1992) Distance metrics for comparing shapes in the plane. In: Donald BR, Kapur D, Mundy J (eds) Symbolic and numerical computation for artificial intelligence. Academic Press, New York, pp 201–219Google Scholar
- Johnson BA, Blevins RA (1994) NMRView: a computer program for the visualization and analysis of NMR data. J Biomol NMR 4:603–614CrossRefGoogle Scholar
- Kuszewski J, Schwieters CD, Garrett DS, Byrd RA, Tjandra N, Clore GM (2004) Completely automated, highly error-tolerant macromolecular structure determination from multidimensional nuclear overhauser enhancement spectra and chemical shift assignments. J Am Chem Soc 126(20):6258–6273CrossRefGoogle Scholar
- Langmead C, Donald B (2004) An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments. J Biomol NMR 29(2):111–138CrossRefGoogle Scholar
- Li M, Phatnani HP, Guan Z, Sage H, Greenleaf AL, Zhou P (2005) Solution structure of the Set2-Rpb1 interacting domain of human Set2 and its interaction with the hyperphosphorylated C-terminal domain of Rpb1. Proc Natl Acad Sci 102:17636–17641CrossRefADSGoogle Scholar
- Linge JP, Habeck M, Rieping W, Nilges M (2003) ARIA: automated NOE assignment and NMR structure calculation. Bioinformatics 19(2):315–316CrossRefGoogle Scholar
- Lovell SC, Word JM, Richardson JS, Richardson DC (2000) The penultimate rotamer library. Proteins Struct Funct Genet 40:389–408CrossRefGoogle Scholar
- Mumenthaler C, Güntert P, Braun W, Wüthrich K (1997) Automated combined assignment of NOESY spectra and three-dimensional protein structure determination. J Biomol NMR 10(4):351–362CrossRefGoogle Scholar
- Ottiger M, Delaglio F, Bax A (1998) Measurement of J and dipolar couplings from simplified two-dimensional NMR spectra. J Magn Reson 138:373–378CrossRefGoogle Scholar
- Permi P, Rosevear PR, Annila A (2000) A set of HNCO-based experiments for measurement of residual dipolar couplings in
^{15}N,^{13}C, (^{2}H)-labeled proteins. J Biomol NMR 17:43–54CrossRefGoogle Scholar - Potluri S, Yan AK, Chou JJ, Donald BR, Bailey-Kellogg C (2006) Structure determination of symmetric homo-oligomers by a complete search of symmetry configuration space using NMR restraints and van der Waals packing. Proteins 65:203–219CrossRefGoogle Scholar
- Potluri S, Yan AK, Donald BR, Bailey-Kellogg C (2007) A complete algorithm to resolve ambiguity for intersubunit NOE assignment in structure determination of symmetric homo-oligomers. Protein Sci 16:69–81CrossRefGoogle Scholar
- Rieping W, Habeck M, Nilges M (2005) Inferential structure determination. Science 309:303–306CrossRefADSGoogle Scholar
- Ruan K, Briggman KB, Tolman JR (2008) De novo determination of internuclear vector orientations from residual dipolar couplings measured in three independent alignment media. J Biomol NMR 41:61–76CrossRefGoogle Scholar
- Skrynnikov NR, Kay LE (2000) Assessment of molecular structure using frame-independent orientational restraints derived from residual dipolar couplings. J Biomol NMR 18(3):239–252CrossRefGoogle Scholar
- Tian F, Valafar H, Prestegard JH. (2001) A dipolar coupling based strategy for simultaneous resonance assignment and structure determination of protein backbones. J Am Chem Soc 123:11791–11796CrossRefGoogle Scholar
- Tjandra N, Bax A (1997) Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science 278:1111–1114CrossRefADSGoogle Scholar
- Tolman JR, Flanagan JM, Kennedy MA, Prestegard JH (1995) Nuclear magnetic dipole interactions in field-oriented proteins: Information for structure determination in solution. Proc Natl Acad Sci USA 92:9279–9283CrossRefADSGoogle Scholar
- Vijay-Kumar S, Bugg CE, Cook WJ (1987) Structure of ubiquitin refined at 1.8 A resolution. J Mol Biol 194:531–544CrossRefGoogle Scholar
- Vuister GW, Bax A (1993) Quantitative J correlation: a new approach for measuring homonuclear three-bond J(H
^{N}H^{α}) coupling constants in^{15}N-enriched proteins. J Am Chem Soc 115:7772–7777CrossRefGoogle Scholar - Wang L, Donald BR (2004a) Analysis of a systematic search-based algorithm for determining protein backbone structure from a minimal number of residual dipolar couplings. In: Proceedings of The IEEE computational systems bioinformatics conference (CSB), Stanford, CA (August 2004), pp 319–330, PMID: 16448025Google Scholar
- Wang L, Donald BR (2004b) Exact solutions for internuclear vectors and backbone dihedral angles from NH residual dipolar couplings in two media, and their application in a systematic search algorithm for determining protein backbone structure. J Biomol NMR 29(3):223–242CrossRefGoogle Scholar
- Wang L, Donald BR (2005) An efficient and accurate algorithm for assigning nuclear overhauser effect restraints using a rotamer library ensemble and residual dipolar couplings. In: The IEEE computational systems bioinformatics conference (CSB), Stanford, CA (August 2005), pp 189–202, PMID: 16447976Google Scholar
- Wang L, Mettu R, Donald BR (2006) A polynomial-time algorithm for de novo protein backbone structure determination from NMR data. J Comput Biol 13(7):1276–1288CrossRefMathSciNetGoogle Scholar
- Wedemeyer WJ, Rohl CA, Scheraga HA (2002) Exact solutions for chemical bond orientations from residual dipolar couplings. J Biomol NMR 22:137–151CrossRefGoogle Scholar
- Zeng J, Tripathy C, Zhou P, Donald BR (2008) A Hausdorff-based NOE assignment algorithm using protein backbone determined from residual dipolar couplings and rotamer patterns. In: Proceedings of the 7th annual international conference on computational systems bioinformatics, Stanford CA, pp 169–181. ISBN 1752–7791. PMID: 19122773Google Scholar