Structure-based protein NMR assignments using native structural ensembles
- First Online:
- Cite this article as:
- Apaydın, M.S., Conitzer, V. & Donald, B.R. J Biomol NMR (2008) 40: 263. doi:10.1007/s10858-008-9230-x
- 78 Views
An important step in NMR protein structure determination is the assignment of resonances and NOEs to corresponding nuclei. Structure-based assignment (SBA) uses a model structure (“template”) for the target protein to expedite this process. Nuclear vector replacement (NVR) is an SBA framework that combines multiple sources of NMR data (chemical shifts, RDCs, sparse NOEs, amide exchange rates, TOCSY) and has high accuracy when the template is close to the target protein’s structure (less than 2 Å backbone RMSD). However, a close template may not always be available. We extend the circle of convergence of NVR for distant templates by using an ensemble of structures. This ensemble corresponds to the low-frequency perturbations of the given template and is obtained using normal mode analysis (NMA). Our algorithm assigns resonances and sparse NOEs using each of the structures in the ensemble separately, and aggregates the results using a voting scheme based on maximum bipartite matching. Experimental results on human ubiquitin, using four distant template structures show an increase in the assignment accuracy. Our algorithm also improves the robustness of NVR with respect to structural noise. We provide a confidence measure for each assignment using the percentage of the structures that agree on that assignment. We use this measure to assign a subset of the peaks with even higher accuracy. We further validate our algorithm on data for two additional proteins with NVR. We then show the general applicability of our approach by applying our NMA ensemble-based voting scheme to another SBA tool, MARS. For three test proteins with corresponding templates, including the 370-residue maltose binding protein, we increase the number of reliable assignments made by MARS. Finally, we show that our voting scheme is sound and optimal, by proving that it is a maximum likelihood estimator of the correct assignments.
KeywordsAutomated NMR assignmentsNormal mode analysisNMR structural biologyProtein flexibility via structural ensemblesStructural bioinformatics
- bb RMSD
Backbone root mean square distance
N-terminal domain of enzyme I
G-α interacting protein
Maximum bipartite matching
Maximum likelihood estimator
Normal mode analysis
Nuclear magnetic resonance
Nuclear overhauser effect
Nuclear vector replacement
Residual dipolar coupling
Streptococcal protein G
One of the key steps in NMR protein structure determination is resonance and NOE assignments. The assignment problem requires mapping spectral peaks to tuples of interacting atoms in a protein. In this paper, we report a new algorithm for automated structure-based NMR assignments by exploiting an ensemble of structural templates.
Structure-based assignment (SBA) denotes automated assignment given prior information in the form of the putative structure (“template”) of the protein. By analogy, in X-ray crystallography, the molecular replacement (MR) technique allows solution of the crystallographic phase problem when a “close” or homologous structural model is known, thereby facilitating rapid structure determination (Rossman and Blow 1962). An automated procedure for rapidly determining NMR assignments given a homologous structure will similarly accelerate structure determination. Furthermore, even when the structure has already been determined by crystallography or homology modeling, NMR assignments are valuable to probe protein–protein interactions and protein–ligand binding (via chemical shift mapping or line-broadening). Previous SBA algorithms include CAP (Al-Hashimi and Patel 2002; Hus et al. 2002), NVR (Langmead et al. 2003; Langmead and Donald 2004a), (Meiler and Baker 2003), and MARS (Jung and Zweckstetter 2004b). The idea of correlating unassigned experimentally-measured residual dipolar couplings (RDCs) with bond vector orientations from a known structure was first proposed by Al-Hashimi and Patel (2002) and subsequently demonstrated by Al-Hashimi et al. (2002) who considered permutations of assignments for RNA. In Hus et al. (2002), RDC-based maximum bipartite matching (MBM) was successfully applied to SBA. Similarly, MARS (Jung and Zweckstetter 2004b) matches RDCs to those calculated from a known structure. An SBA algorithm should be robust with respect to structural noise and handle distant structural templates: A small change in the putative structure should not change the assignments drastically and it should work even when a close structural template is not available.
NVR (Langmead et al. 2003; Langmead and Donald 2004a) is an MR-like approach for SBA of resonances and sparse NOEs. NVR computes assignments that correlate experimentally-measured HN–15N HSQC, HN−15N RDCs (in two media), 3D NOESY-15N-HSQC spectra (dNN’s) and amide exchange rates, to a given backbone structural model. The algorithm requires only uniform 15N-labeling of the protein. The NMR data used by NVR can be acquired relatively rapidly compared to the traditional suite of experiments used to perform assignments. NVR runs in minutes and assigns with high accuracy the (HN,15N) backbone resonances as well as the sparse dNN’s from the 3D 15N-NOESY spectrum. NVR works well only when the structure of the protein is known or for close templates (less than 2 Å backbone (bb) RMSD). SBA in general and NVR in particular have had an impact on algorithms for NMR methodology (Bailey-Kellogg et al. 2004; Vitek et al. 2005), and SBA has been important in the determination of protein structures (Potluri et al. 2006, 2007).
We introduce an algorithm that extends the circle of convergence of NVR such that distant templates can be used to obtain high assignment accuracies. We also improve NVR’s robustness with respect to structural noise. In addition, we provide a measure of confidence for individual assignments.
We demonstrate the generality of our approach (of using NMA ensembles around a given template with our voting scheme) with MARS, which is a significantly different SBA tool from NVR (in terms of its algorithm and its input data). MARS can use both 13C- and 15N-labeled data and takes as input the observed intra- and inter-residual chemical shifts grouped into pseudoresidues (PR). Depending on the type of available spectra, MARS uses chemical shifts of HiN, Ni, C′i−1, Ciα, Cαi−1, Cβi, Cβi−1, grouped into a PR with the HiN and Ni serving as an anchor, obtained from an 15N-1H HSQC spectrum. In addition, when a template structure is available, MARS can use arbitrary RDCs from triple-resonance experiments to help the assignments. MARS is a hybrid assignment framework that optimizes local and global quality of fit of the amino acid sequence to the pseudoresidues. It links pseudoresidues to obtain PR segments of length five to two using sequential connectivity information in the linking stage. It then maps these segments to the amino acid sequence in the matching stage to obtain the assignments. It compares these assignments with one obtained using a global energy function and retains the consistent assignments. MARS follows an iterative procedure, where the experimental data is perturbed by adding noise to extract robust assignments. MARS computes a reliability information for each assignment, denoting each assignment as with low, medium or high confidence. It also lists all possible assignments for a given PR, along with their probabilities. We demonstrate our algorithm on three proteins that come with the MARS software distribution (Jung and Zweckstetter 2004a), and corresponding templates. The templates are close structural homologs of the corresponding target proteins, and with 100% sequence identity to the target proteins. The target proteins are: 76-residue human ubiquitin, 259-residue amino terminal domain of enzyme I from E. Coli (EIN), and 370-residue maltose-binding protein (MBP). With our new technique, we show that the number of correct and reliable (high confidence) assignments increases in all test cases. As in Jung and Zweckstetter (2004b), we apply our framework to MARS with varying amount of data, such as with and without sequential connectivity information, and up to three RDCs per residue. Depending on the amount of data used as input, the number of correct and reliable assignments increases by up to 23 at the expense of introducing three incorrect assignments (corresponding to a 3-fold increase in the number of correct assignments). Furthermore, the number of incorrect assignments generally does not increase.
Using an ensemble of structures in SBA is reasonable, since the structures of proteins in the PDB presumably correspond only to the ground state of these proteins (Kay 1998). The NMR data acquired from a protein in solution corresponds to a time- and ensemble-average over the many conformations assumed during data acquisition. We use NMA to perturb the template to obtain an ensemble of structures. NMA is a technique commonly used to study the low-frequency motion of proteins. It represents the energy landscape around a given energy minimum with a harmonic approximation and solves for the equations of motion within that well analytically. It has been shown that over half of the known protein movements can be modeled by displacing the protein along at most two low frequency normal modes (Krebs et al. 2002). Furthermore, NMA has been shown to reproduce the deformations in the core of homologous proteins caused by sequence differences in 35 large, diverse, and well studied superfamilies (Leo-Macias et al. 2005). Therefore, it seems reasonable to expect that the conformational differences between the template and the target protein can be modeled by NMA. In contrast to classical molecular motion simulation techniques such as molecular dynamics, NMA can very rapidly compute an ensemble of structures that correspond to the likely conformations assumed by the molecule around its energy minimum. NMA has been successful in predicting experimental quantities such as temperature factors of proteins (Bahar et al. 1997). We use coarse-grained NMA where several amino acids are grouped into a single super-residue which effectively removes the small scale fluctuations of a protein such as sidechain motions to model the slow, large-scale motions (such as backbone rearrangements) (Suhre and Sanejouand 2004a).
To the best of our knowledge, ours is the first approach that uses ensembles for structure-based resonance assignments. Note that previously ensembles have been used successfully in structure determination (given assignments) (Best and Vendruscolo 2004), and for NOE assignments (given resonance assignments) (Mumenthaler et al. 1997; Güntert 2004). Our results show that ensemble-based approaches are also useful for structure-based resonance assignments.
NMA was analogously used by Suhre and Sanejouand (2004b) for protein structure determination by MR, using X-ray diffraction data. The authors observed that although the original template did not help solve the crystallographic phase problem, there existed a structure in the NMA ensemble that enabled the refinement of the target structure. This structure was chosen from the ensemble using a scoring function.
The use of NMA structural ensembles in structure-based NMR assignments,
“Robust” NMR assignments with respect to structural noise, by which we mean there is only a small change in assignment accuracy when the input structure changes slightly (note that this is not the case in general for maximum bipartite matching based assignment algorithms (including Langmead and Donald 2004a),
Increased radius of convergence of NVR with respect to the target–template structural similarity,
Improved assignment accuracy of NVR for distant templates (by up to 22%),
A confidence measure for each assignment,
A demonstration of the generality of our framework by improving the assignment accuracy of MARS on three test proteins (by up to 3-fold), and
A proof that our voting rule, which aggregates the assignments corresponding to individual models, is a maximum likelihood estimator.
NMR data used by NVR
An assignment algorithm must determine the mapping of the resonances and NOEs to the corresponding nuclei of the protein. We can define the assignment problem as the mapping of the peaks to the corresponding residues, due to the specific set of NMR data used by our framework.
We use the following NMR data: HN-15N HSQC, NOESY-15N-HSQC (yielding sparse dNN’s, observed between nearby pairs of amide protons), NH RDCs in two media (which provide global orientational restraints on NH amide bond vectors), 15N TOCSY (for the sidechain chemical shifts), and amide exchange HSQC (to identify, probabilistically, solvent-exposed amide protons).
Here Dmax is the dipolar interaction constant, v is the internuclear bond vector orientation relative to an arbitrary molecular frame, and S is the 3 × 3 Saupe order matrix which describes the average substructure alignment in the weakly-aligned anisotropic phase. Equation 1 shows the quadratic dependence of r on v, thus explaining the sensitivity of RDCs (and hence, SBA algorithms that use RDCs, such as NVR) with respect to structural noise.
Only unambiguous dNN’s are used in NVR. Typically only a few unambiguous dNN’s (e.g., 43 for ubiquitin) can be obtained from the 3D-NOESY. These dNN’s are automatically-assigned as a byproduct of NVR’s resonance assignments (Langmead and Donald 2004a).
NVR is an automated SBA algorithm for proteins of known structure or with a known close structural homolog. NVR uses MBM in an expectation maximization (EM) framework to compute the assignments. Each peak p and residue r form the nodes of a bipartite graph (BPG), where one set of vertices is the set of peaks, the other set of vertices is the set of residues, and the edges correspond to the likelihood of assigning p to r in the bipartite graph. The EM framework is used to iteratively select the most likely (peak, residue) assignment. More details can be found in (Langmead and Donald 2004a).
NVR integrates various NMR data as a means to increase the signal-to-noise ratio. The signal is the computed likelihood of the assignment between a peak and the (correct) residue. The noise is the uncertainty in the data, where the probability mass is distributed among multiple residues. Each line of evidence (i.e., experiment) has noise, but the noise tends to be random and thus cancels when the lines of evidence are combined. Conversely, the signals embedded in each line of evidence tend to reinforce one another, resulting in relatively unambiguous assignments.
NVR has the advantage that it only needs 15N-labeled data, which is cheaper to obtain than 13C-labeling, which is required by many automated assignment algorithms. NVR only uses unassigned data.
Assignment accuracy (% correct assignments) of NVR with distant templates for corresponding target proteins
Sequence identity (%)e
We further tested our algorithm on three more proteins and a set of three structurally close templates, previously studied by Jung and Zweckstetter (2004b), and that came with the MARS software distribution (Jung and Zweckstetter 2004a). The target proteins are, human ubiquitin (PDB ID 1D3Z), the 259-residue amino terminal domain of enzyme I from E. coli (EIN) (PDB ID 3EZA), and the 370-residue maltose-binding protein (MBP) (PDB ID 1EZP). The set of NMR data used for these proteins, as well as the template information, is given in Table 3. Unlike our tests with NVR, in which we used a more distant ensemble of structures, the templates are structurally closer to the target structure (the bb RMSD ranges between 0.4–3.7 Å). Hence this study provides both a test of our algorithm with a significantly different SBA tool, as well as with structurally similar templates.
For both NVR and MARS, we used an NMA webserver, elNémo (Suhre and Sanejouand 2004a) to obtain an ensemble of structures around the template. We computed the five lowest-frequency normal mode displacements, with default parameters. Each of the low frequency normal modes returned 11 structures, corresponding to the motion of the template along that normal mode. We thus obtain 55 structures. We also displaced the template structure bidirectionally along its two lowest frequency normal modes, resulting in a total of 176 structures per template. The bb RMSD of the most distant structure to the starting model is less than 3 Å.
Our algorithm runs in O(mn + mn2.5 log(cn)) time, where m is the number of models in the ensemble, n is the number of residues in the target protein, and c is the maximum edge weight in an integer-weighted bipartite graph. In comparison, NVR runs in time O(n2.5 log(cn)), whereas HD has a time complexity of O(pn2.5 log(cn) + p log p + pn), where p is the number of proteins in a database of structural models. For a discussion of the complexity of NVR and HD, see Langmead and Donald (2004a, b) respectively. For reference, c is a constant and is dictated by the resolution of the NMR data. NVR runs in minutes on a desktop PC to assign a protein with about 56–128 residues using one template.
We ran NVR for all three target proteins, with the corresponding templates obtained from structural homologs, for each of the ensemble of models obtained by NMA. We report the assignment accuracy for the template structure, as well as the range of accuracies in the NMA ensemble in Table 1. It can be seen that if we could choose the right template from this ensemble, we would improve the assignment accuracy of NVR. However this requires a scoring function that correlates strongly with the assignment accuracies.
Using a scoring function to choose a model from the ensemble
Suhre and Sanejouand (2004b) used NMA to perturb the structural model, and then chose a perturbed template structure with a scoring function (“free R factor”) in MR in X-ray crystallography, which allowed them to solve the target protein structure. We hypothesized that we could follow a similar methodology to choose a template from the NMA ensemble as input to NVR in NMR structure determination.
The HD score function combines the “preference list” of all the seven “voters” of NVR. These “voters” correspond to the NMR data used by NVR. They are: RDCs in two media, chemical shifts predicted using three different protocols (Langmead and Donald 2004a), amide-exchange and TOCSY data. Each “voter” has a ranked list of probabilities (“preference list”) for each peak, corresponding to the likelihood of matching that peak with each residue (e.g., according to RDCs). The HD scoring function (Langmead and Donald 2004b) simply multiplies and normalizes these probabilities to obtain an overall matrix representing the aggregated preference of all the voters for each peak. Given an assignment, the set of probabilities corresponding to individual (peak, residue) assignments are combined and returned as the HD score.
MBM voting over the NMA ensemble
We used MBM to aggregate the assignments corresponding to all of the models in the NMA ensemble (see Methods). The results of this scheme are in column entitled ‘Ensemble accuracy’ of Table 1. For all three proteins with the corresponding templates, the assignment accuracy improves in all but one of the seven protein–template pairs, with respect to the starting structural model, by up to 22%. We also combined the assignments of all the models corresponding to all four templates for human ubiquitin and both templates for SPG, obtaining an even higher accuracy (86% and 69%, respectively, shown in column entitled ‘Ensemble accuracy’ and row entitled ‘All templates’ of Table 1). For SPG using the template obtained from pdb ID 1JML (which is 8.7 Å bb RMSD from SPG), the assignment accuracy actually decreases with the consensus scheme. Note that the template obtained from 1JML is the farthest structure from its corresponding target structure in our test cases, and it may be that this starting template is outside the radius of convergence of NVR.
Given the assignments for each structural model in the NMA ensemble and the consensus assignments computed using MBM voting, we can compute the fraction of models that agree on a given resonance assignment. This ratio can be used as a ‘confidence’ measure for that assignment. Intuitively, the larger the number of models that agree on a (peak, residue) assignment, the less likely it is that that assignment is due to noise.
In Fig. 4, there are very few incorrect assignments for which more than half of the models agree. Therefore, we selected a threshold of 50%, and called an individual assignment ‘confident’ if more than 50% of the models agree on that assignment. The assignment accuracy of the confident assignments is in the last column of Table 1. We also combined all the models and report the corresponding ‘confident’ assignment accuracy. The ‘confident’ assignment accuracy is higher than consensus assignment accuracy in all cases.
Effect of varying the confidence threshold: Number of correct and incorrect peaks with NVR with distant templates for corresponding target proteins, with varying confidence thresholds
# Correct (# incorrect)c
# Correct (# incorrect)d
Robustness with respect to structural noise
Application of our framework to MARS
Proteins used with MARS
Template (crystal) structureb (PDB ID)
Sequencec identity (%)
# of residues with data
# of residues
RDCs (PDB ID)
bbd RMSD (Å)
CEe RMSD (Å)
MARS assignment accuracy improves with our NMA ensemble-based voting algorithm
Chemical Shifts for linking
Chemical shifts for matching
Reliable assignments # correct (# incorrect)
Without sequential connectivity information
C′i−1, Cαi−1, Cβi−1
C′i−1, Cαi−1, Cβi−1
1DNH, 1DNC′, 1DCaC′
C′i−1, Cαi−1, Cβi−1
With sequential connectivity information
C′i−1, Cαi−1, Cαi
C′i−1, Cαi−1, Cαi
1DNH, 1DNC′, 1DCaC′
C′i−1, Cαi−1, Cαi
C′i−1, Cαi−1, Cαi, Cβi−1, Cβi
C′i−1, Cαi−1, Cαi, Cβi−1, Cβi
C′i−1, Cαi−1, Cαi, Cβi−1, Cβi
1DNH, 1DNC′, 1DCaC′
C′i−1, Cαi−1, Cαi, Cβi−1, Cβi
Discussion and conclusions
In this paper, we improved the assignment accuracy of NVR for distant structural models, and made it robust with respect to structural noise. On three different proteins, with distant structural homologs, we obtained an increased assignment accuracy compared to the initial structural model for all cases but one, which used the template farthest from the target structure in our test set. However, in this case, combining the ensembles from both templates still increased the assignment accuracy. We also calculated a measure of confidence in the individual assignments. We used this measure to assign a subset of the peaks with even higher assignment accuracy. We also improved the robustness of NVR with respect to structural noise. We further demonstrated the general applicability of our approach to SBA by improving the assignment accuracy of MARS, a significantly different SBA algorithm from NVR.
Given a distant structural homolog, our methodology used NMA to obtain a set of structural models, which were then provided as input to NVR. We combined the NVR assignments for each of these structural models by maximum bipartite matching. The percentage of structural models that agreed on a given assignment provided the confidence measure. We also showed (see Appendix) that MBM is a maximum likelihood estimator of the correct assignments.
The greatest improvement with our ensemble-based assignments comes when we do not have sequential connectivity information. Nevertheless, modest improvements are seen even with sequential connectivities. Even these modest improvements are potentially useful, and our results represent a significant improvement over all previous structure-based assignment algorithms (e.g., Hus et al. 2002; Meiler and Baker 2003) for distant structural homologs (as opposed to exact crystal structure). No former SBA algorithm performs well using even slightly distant homologs. For instance, (Hus et al. 2002) was tested only on the crystal structure. In Meiler and Baker (2003), assignment accuracies in the range of only 5% to 40% were obtained using ROSETTA models with 3–6 Å RMSD from the native structure. Our approach tests whether ensembles for assignments can begin to overcome this bottleneck, and forms a basis for SBA that can be improved in the future.
Our approach demonstrates that an ensemble of structures simulating the fluctuations of a protein in its native state improves the accuracy and robustness of SBA. Furthermore, our voting scheme reinforces the signal (for the correct assignments), whereas the noise (incorrect assignments) cancels out. This is supported by the fact that we obtain high assignment accuracies despite the large fluctuations in assignment accuracies across the ensembles. Therefore, NMA is useful for both MR in X-ray crystallography (Suhre and Sanejouand 2004b) and SBA in NMR (this paper). Note that our results with MARS show that the best structure in the NMA ensemble helps improve the assignment accuracy, with respect to the starting template, analogous to (Suhre and Sanejouand 2004b). However, unlike (Suhre and Sanejouand 2004b), we also show that an entire ensemble is useful to improve the assignment accuracy with NVR.
It is interesting that our voting scheme obtains an assignment accuracy that is greater than or equal to the maximum assignment accuracy achieved by any individual structure in the NMA ensembles, both with MARS and NVR, for most target protein–template pairs. This suggests that our voting scheme is more likely to improve the assignment accuracy than any single-structure scoring function.
An analysis of our assignments reveals that the confident assignments (with a confidence threshold of 0.9) which have 95% or higher assignment accuracy mostly fall into regular secondary structure elements. For ubiquitin, GαIP and SPG, 1/5, 3/40 and 1/11 of the confident assignments fall into loop regions, respectively; furthermore for GαIP, the sixth helix contains most of the correct assignments, similarly, most of the confident assignments of SPG are in its alpha helix. The secondary structure elements are the similar regions between the target and the template protein, and therefore it is expected to find most of the correct assignments in those regions.
We envision three scenarios where our ensemble approach is useful. The first is for medium-sized proteins. One can perform a suite of triple-resonance experiments and use MARS with our ensemble method in order to improve MARS assignments, as was shown in this paper. Thus, we tested the hypothesis that SBA can be improved using ensembles, for medium-sized proteins. The second scenario is also for medium-sized proteins, but our NVR protocol requires only 15N-labeling and reduced spectrometer time. While RDCs must be measured, recent progress made by Tolman and co-workers (Ruan and Tolman 2005) make it more convenient to find multiple alignment media for the proposed RDC measurements. Measurement of RDCs for small- to medium-sized proteins usually only needs 2D IPAP experiments, and thus can be done in less time. The third scenario is for large proteins, where one can hopefully collect chemical shifts, dNN’s, and RDCs (but other data might be hard to collect), and then use NVR with our ensemble-based technique. Since our algorithm requires only sparse data, this could make it less susceptible to the overlap problems that can occur with large proteins. Finally, since NVR requires only 15N-labeling, the cost of sample preparation is less for the last two scenarios.
Our approach should be valuable in pharmacology and drug design (Ferentz and Wagner 2000) by helping assign proteins for which there is no close structural homolog available. One could use our scheme to assign a subset of peaks with high confidence, and then do a few more disambiguating NMR experiments (e.g., using selective labeling) in order to assign the remaining peaks. Furthermore, it is possible to run the algorithm iteratively, setting the confident assignments found in the previous iteration to boost the number of peaks reported with high confidence. Our method is simple and general, and can be used with other SBA algorithms, such as MARS, to improve their accuracy and robustness.
Our approach has some similarities to previous work such as Jung and Zweckstetter’s (2004b) MARS and (Meiler and Baker 2003). Both of these works obtain multiple assignments for a protein, and retain the subset of peak-residue assignments that are consistent across those assignments. The difference is in how the assignments are computed. Jung and Zweckstetter (2004b) modulate the predicted chemical shifts by adding Gaussian noise and run MARS on perturbed data to obtain new assignments. Meiler and Baker (2003) start from random assignments and then use Monte Carlo search to optimize them. In contrast, we compute an ensemble of structures using NMA, and then use each structure to calculate a new assignment. Of these three approaches, ours is the only one that simulates the likely equilibrium conformations assumed by the template protein. It also has an intuitive correspondence with the NMR ensemble that generated the experimental data. As shown in section "Application of our framework to MARS", our approach can be used with MARS; it is likely that it can also be used with other (such as Meiler and Baker’s 2003) SBA algorithms.
As future work, we are interested in developing a single-structure scoring function that takes into account the dependencies between various sources of NMR data. This would allow to choose a model from the ensemble that has the highest assignment accuracy. Secondly, other techniques that characterize the flexibility of protein structures such as FRODA (Wells et al. 2005) or protein ensemble method (Shehu et al. 2006) could be used and compared with NMA using the lens of SBA. Finally, NVR currently returns a single assignment for each template, even though there may be many assignments consistent with the structural model. Incorporating backtracking into the assignments as in Vitek et al. (2005) to obtain all consistent assignments could improve the accuracy and robustness.
The NVR software as well as our scripts to run NVR on an ensemble of proteins and aggregate the results are available upon request. It is written in Matlab and Perl and is approximately 10K lines.
Our scripts to run MARS on an ensemble of templates and aggregate the results are less than 1K lines of code and are similarly available upon request.
We thank Drs. C. Bailey-Kellogg, P. Zhou, Mr. D. Keedy, Mr. J. MacMaster, Mr. C. Tripathy, Mr. A. Yan, Mr. M. Zeng and all members of the Donald Lab for discussions and comments. This work is supported by a grant to B.R.D. from the National Institute of Health (R01 GM-65982).