Performance of the WeNMR CS-Rosetta3 web server in CASD-NMR

van der Schot, Gijs; Bonvin, Alexandre M. J. J.

doi:10.1007/s10858-015-9942-7

Performance of the WeNMR CS-Rosetta3 web server in CASD-NMR

Article
Open access
Published: 17 May 2015

Volume 62, pages 497–502, (2015)
Cite this article

Download PDF

You have full access to this open access article

Journal of Biomolecular NMR Aims and scope Submit manuscript

Performance of the WeNMR CS-Rosetta3 web server in CASD-NMR

Download PDF

1561 Accesses
9 Citations
Explore all metrics

Abstract

We present here the performance of the WeNMR CS-Rosetta3 web server in CASD-NMR, the critical assessment of automated structure determination by NMR. The CS-Rosetta server uses only chemical shifts for structure prediction, in combination, when available, with a post-scoring procedure based on unassigned NOE lists (Huang et al. in J Am Chem Soc 127:1665–1674, 2005b, doi:10.1021/ja047109h). We compare the original submissions using a previous version of the server based on Rosetta version 2.6 with recalculated targets using the new R3FP fragment picker for fragment selection and implementing a new annotation of prediction reliability (van der Schot et al. in J Biomol NMR 57:27–35, 2013, doi:10.1007/s10858-013-9762-6), both implemented in the CS-Rosetta3 WeNMR server. In this second round of CASD-NMR, the WeNMR CS-Rosetta server has demonstrated a much better performance than in the first round since only converged targets were submitted. Further, recalculation of all CASD-NMR targets using the new version of the server demonstrates that our new annotation of prediction quality is giving reliable results. Predictions annotated as weak are often found to provide useful models, but only for a fraction of the sequence, and should therefore only be used with caution.

Analysis of the structural quality of the CASD-NMR 2013 entries

Article Open access 03 June 2015

Improved reliability, accuracy and quality in automated NMR structure calculation with ARIA

Article Open access 11 April 2015

NMR data-driven structure determination using NMR-I-TASSER in the CASD-NMR experiment

Article 04 March 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

An understanding of the three-dimensional (3D) structure of proteins at atomic resolution and their conformational variability and dynamics, is essential for a proper understanding of their function and their interactions with other proteins and ligands, and for rational drug design (van den Bedem and Fraser 2015). Currently there are several techniques that can produce protein structures at atomic resolution: X-ray crystallography, and nuclear magnetic resonance spectroscopy (NMR), with cryo-electron microscopy (cryo-EM) now reaching atomic resolution with recent advances in detector technology and improved software and algorithms (Bai et al. 2015). NMR is limited in the size of molecules it can study, but has the advantage with respect to other methods that it can study protein dynamics from picosecond up to millisecond time scales and beyond.

The most time-consuming and difficult part of NMR structure elucidation is the assignment of side chain chemical shifts and the NOE cross peaks and several methods have been developed over the years to automate as much as possible this process, often in combination with structure calculations (Guerry and Herrmann 2011). Methods such as CS-ROSETTA (Shen et al. 2008), CHESSHIRE (Cavalli et al. 2007) and CS23D (Wishart et al. 2008) avoid this step by exploiting the structural knowledge present in the readily available backbone chemical shifts. The backbone chemical shifts themselves reflect an appreciable amount of structural information, such as backbone and side-chain conformations, secondary structure, aromatic ring position and the presence of hydrogen bonds. These methods use the backbone chemical shift, together with a database of known protein structures and of their backbone chemical shifts to predict the 3D structure of proteins.

The standard CS-ROSETTA protocol consists of three steps: (1) the selection of fragments; (2) the assembly of models from these fragments; (3) the selection of models. In a recent paper we introduced a number of algorithmic advances for CS-ROSETTA including the rosetta3 fragment picker (R3FP), and a post-analysis procedure that annotates the reliability of predicted structure, and identifies the locally converged regions of the models (van der Schot et al. 2013). These improvements together are shown to improve the reliability, convergence of the final structure. The annotation prediction is based on: (1) the total number of converged residues, (2) the significance of the ROSETTA energy gap, and (3) the quality of the chemical shift data. The label strong indicates that the converged regions are likely to be correct, whereas the annotation weak indicates that the conserved regions have to be handled with care.

In this work we assess the impact of those recent developments by (re) predicting the structure of 19 CASD-NMR (critical assessment of automated structure determination by NMR) (Rosato et al. 2009, 2012) targets. We used the WeNMR (Wassenaar et al. 2012) webservice CS-ROSETTA3 (https://www.wenmr.eu/wenmr/structure-calculation-software) (van der Schot et al. 2013), connected to the computational resources of the European Grid Initiative (EGI, www.egi.eu), for efficient CS-ROSETTA3 calculations. This service uses the new R3FP fragment picker for fragment selection, distributes the assembly step over the available nodes (using ROSETTA3.3), and implements the new post-analysis procedure (van der Schot et al. 2013). The results are compared to the results from our original structure predictions submitted to CASD-NMR.

Materials and methods

We evaluated our new structure prediction methodology by predicting the structure of 19 CASD-NMR targets. The targets are named by their respective CASD-NMR and PDB-IDs. They were all provided by the Northeast Structural Genomic Consortium (Huang et al. 2005a), representing a consistent set of data made available via the WeNMR site (https://www.wenmr.eu/wenmr/casd-nmr). We omitted target 2LOJ, due to the large number of unusual and ‘flexible’ amino acids, as we did for the CASD submission. The sequence length of the targets varies between 50 and 149 amino acids, and any flexible termini were excluded from the predictions.

Fragment selection

The web service CS-ROSETTA3 used R3FP fragment picker for fragment selection. As input only the backbone NMR chemical shift lists were used. Lists can be supplied in any of the NMRPipe(TALOS) (Delaglio et al. 1995), NMR-Star 2.1, or NMR-Star 3.1 (BMRB) formats (Doreleijers et al. 2003).

Assembly

The web service CS-ROSETTA3 used the selected fragments in the ROSETTA3.3 assembly step. For each target, 50.000 models were generated automatically, using the standard CS-ABRELAX protocol. The model generation step was distributed over the available nodes in the worldwide WeNMR grid under the European Grid Initiative (EGI).

Conserved regions

The conserved regions of a protein structure prediction were determined using an adaptation of the Gaussian-weighted RMSD method (Damm and Carlson 2006). The 30 lowest ROSETTA energy structures were superimposed using a scaling factor of 2 Å² (Damm and Carlson 2006). This procedure iteratively determines the set of residues on which the structures can be superimposed; residues with a root mean square fluctuation (RMSF) of <2 Å are considered to be converged. Gaps smaller than 3 residues between two low RMSF regions are ignored.

Annotation

The cs-class, convergence and energy-gap criteria were used for determining the annotation (van der Schot et al. 2013). The cs-class criterion is the fraction of residues classified “GOOD” by TALOS+ (Shen et al. 2009). Convergence is the fraction of residues, which are considered to be part of a conserved region. The energy gap is the difference between the median energy score of the 10 lowest energy score, and the median energy score of the 10 lowest energy models >4 Å away from the best energy model. The gap is directly mapped to [0, 1] using a sigmoidal function. If the predictor model \( P_{sum} = 0.08 c_{cs - class} + 0.54 c_{convergence} + 0.38 c_{energy - gap} \) exceeded 0.68, predictions were considered strong, and weak otherwise (van der Schot et al. 2013).

Selection of models

The web service uses SPARTA+ (Shen and Bax 2010) to select the final models. For several targets the chemical shift score was combined with the DP score (Huang et al. 2005b). The DP score uses unassigned NOE lists for model selection, which has been shown to improve model selection. Finally the top 5 models after rescoring were used for the comparison step, similarly to the procedure followed for the CASD submissions.

Evaluation

All Root Mean Square Deviations (RMSDs) are the average RMSD calculated over the C_α, C, and N atoms, relative to the 20 PDB deposited reference structures, i.e. the average of all pairwise comparisons between the selected models and each of the 20 reference structures in the PDB entry.

Results

We have compared our original CASD-NMR submissions, both from the first CASD-NMR round, which has been previously evaluated (Rosato et al. 2012) and from the last round, with predictions obtained using the CS-Rosetta3 server (van der Schot et al. 2013), implementing the new R3FP fragment picker for fragment selection. All targets were thus re-run in consistent manner and automatically annotated to evaluate the reliability of the predictions.

Original CASD-NMR round 2 submissions

Compared to the previous round of CASD-NMR where we submitted prediction irrespective of the convergence of the top 5 models, in this second round we followed a more conservative approach, submitting predictions only for those targets that showed convergence (with as guideline an average RMSD of top 5 models from the best model ~<2 Å). Models were submitted for 7 of the 10 CASD-NMR targets (with HP2876B, StT322 and YR313A unconverged). Convergence and accuracies of these submissions are summarized in Table 1.

Table 1 Performance of the CS-Rosetta WeNMR web server in round 2 of CASD-NMR

Full size table

Prediction and annotation using the CS-Rosetta3 server

Table 2 summarizes the results from the structure predictions for all CASD-NMR targets to date. Six out of nineteen targets were annotated as strong (meaning reliable prediction), and thirteen were annotated weak. Out of the strong targets, on average 86 % of the sequence was regarded as conserved. All strong targets had an average pairwise RMSD within 2 Å from the reference structure, calculated over the conserved regions. One target, 2KPT, converged with the new method (RMSD = 1.39 Å), whereas the original submission did not find the correct fold. For the other strong targets, the results from our new protocol are similar to the performance of the old protocol.

Table 2 Performance of the CS-Rosetta WeNMR web server on all recalculated CASD-NMR targets

Full size table

For the weak targets, shorter parts of the sequence were regarded as conserved, on average 33 %, with, for 12 out of 13 targets, an average pairwise RMSD from the reference structure 2 Å. The main reason for the weak annotation for those targets is the small fraction of the sequence showing convergence. Our protocol finds the wrong folds for the converged region of target 2KJ6 and 2LTL.

Figure 1 shows an overview of the six strong targets. For each target the reference structures are in blue, and the predicted structures are in red, with unconverged regions in gray.

Performance of the CS-ROSETTA3 server

Figure 2 shows the average time for each step of the CS-rosetta protocol. On average a complete CS-Rosetta run, including fragment selection, model generation and post-analysis, takes 991 min (16.5 h) on the CS-Rosetta3 WeNMR server. Nearly 45 % of the total time is used to assemble the 50,000 models on the WeNMR EGI grid.

Discussion

Using the CASD-NMR target, we have shown that, as predicted earlier (van der Schot et al. 2013), our annotation method is able to discriminate successful structure predictions. Six out of 19 targets were annotated as strong. For these targets, the distance from the reference structure was below 2 Å with on average 86 % of the sequence converged. This rather low percentage of strong annotations (31.6 %) leaves space of improvements. For example the RASREC method we have previously published (van der Schot et al. 2013) has been shown to increase the number of strong predictions. This method, however, does require a large number of CPU cores with MPI (Message Passing Interface) communication, which cannot currently be implemented on grid resources.

In the case of weak annotations, the determined “rigid” or converged regions of the predicted model can still be useful: Indeed, in 85 % of those ‘weak’ cases the conserved regions are accurately predicted. However, target 2KJ6 and 2LTL do show that the results of weak predictions have to be used with care. Since 2LTL has only 10 % of its sequence converged, the complete structure should be disregarded, which is an easy case. In contrast, 2KJ6 has 48 % of its sequence converged (a reasonably large fraction), but in fold that is different from the reference structure. Except for the annotation, nothing is really indicative of a wrong fold. We therefore recommend to only use weak annotations with care and search for experimental evidence (e.g. in NOE peaks) of their correctness.

Overall, if we would restrict our earlier submitted models to the conserved regions, we see (Table 2) that we have successfully (RMSD from target <2 Å) predicted the structure for these regions in 88 % of the submitted cases (15 out of 17). Six out of these (40 %) correspond to strong annotations with sequence coverage between 64 and 100 %.

Considering the performance of the grid-enabled web server, we can see that distributing the jobs on the grid speeds-up the calculations ~900 times, compared to running on a single CPU (which would not be a realistic scenario for Rosetta calculations—compared to a 100 CPU cluster the speed up would only be ~9 times). Note that the server is using grid resources in an opportunistic manner, farming out 2500 jobs (for 50,000 models, each jobs calculating 20 models) to grid sites (currently 41 sites are supporting WeNMR (see http://gstat.egi.eu/gstat/geo/openlayers#/VO/enmr.eu) and that grid computations come with some overheads in jobs handling and response.

In conclusion, in this second round of CASD-NMR, the WeNMR CS-Rosetta server has demonstrated a much better performance than in the first round, mainly due to the fact that this time only converged targets were submitted while in the first round all targets were submitted irrespective of their convergence. We have also demonstrated on the recalculated targets that our new annotation of prediction quality is giving reliable results. Our annotations might seem rather conservative considering that more targets annotated as weak show a good similarity to the manual reference structure. These might still provide useful information for further NMR work, but should be used with care.

References

Aramini JM, Tubbs JL, Kanugula S et al (2010) Structural basis of O6-alkylguanine recognition by a bacterial alkyltransferase-like DNA repair protein. J Biol Chem 285:13736–13741. doi:10.1074/jbc.M109.093591
Article Google Scholar
Bai X-C, McMullan G, Scheres SHW (2015) How cryo-EM is revolutionizing structural biology. Trends Biochem Sci 40:49–57. doi:10.1016/j.tibs.2014.10.005
Article Google Scholar
Cavalli A, Salvatella X, Dobson CM, Vendruscolo M (2007) Protein structure determination from NMR chemical shifts. Proc Natl Acad Sci USA 104:9615–9620. doi:10.1073/pnas.0610313104
Article ADS Google Scholar
Damm KL, Carlson HA (2006) Gaussian-weighted RMSD superposition of proteins: a structural comparison for flexible proteins and predicted protein structures. Biophys J 90:4558–4573. doi:10.1529/biophysj.105.066654
Article Google Scholar
Delaglio F, Grzesiek S, Vuister GW et al (1995) NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J Biomol NMR 6:277–293. doi:10.1007/BF00197809
Article Google Scholar
Doreleijers JF, Mading S, Maziuk D et al (2003) BioMagResBank database with sets of experimental NMR constraints corresponding to the structures of over 1400 biomolecules deposited in the Protein Data Bank. J Biomol NMR 26:139–146. doi:10.1023/A:1023514106644
Article Google Scholar
Guerry P, Herrmann T (2011) Advances in automated NMR protein structure determination. Q Rev Biophys 44:257–309. doi:10.1017/S0033583510000326
Article Google Scholar
Huang YJ, Moseley HNB, Baran MC et al (2005a) An integrated platform for automated analysis of protein NMR structures. Methods Enzymol Biothermodyn Part C 394:111–141. doi:10.1016/S0076-6879(05)94005-6
Google Scholar
Huang YJ, Powers R, Montelione GT (2005b) Protein NMR recall, precision, and F-measure scores (RPF scores): structure quality assessment measures based on information retrieval statistics. J Am Chem Soc 127:1665–1674. doi:10.1021/ja047109h
Article Google Scholar
Koga N, Tatsumi-Koga R, Liu G et al (2012) Principles for designing ideal protein structures. Nature 491:222–227. doi:10.1038/nature11600
Article ADS Google Scholar
Liu G, Huang YJ, Xiao R et al (2010) NMR structure of F-actin-binding domain of Arg/Abl2 from Homo sapiens. Proteins 78:1326–1330. doi:10.1002/prot.22656
Article Google Scholar
Rosato A, Bagaria A, Baker D et al (2009) CASD-NMR: critical assessment of automated structure determination by NMR. Nat Methods 6:625–626. doi:10.1038/nmeth0909-625
Article Google Scholar
Rosato A, Aramini JM, Arrowsmith C et al (2012) Blind testing of routine, fully automated determination of protein structures from NMR data. Structure 20:227–236. doi:10.1016/j.str.2012.01.002
Article Google Scholar
Shen Y, Bax A (2010) SPARTA+: a modest improvement in empirical NMR chemical shift prediction by means of an artificial neural network. J Biomol NMR 48:13–22. doi:10.1007/s10858-010-9433-9
Article Google Scholar
Shen Y, Lange O, Delaglio F et al (2008) Consistent blind protein structure generation from NMR chemical shift data. Proc Natl Acad Sci USA 105:4685–4690. doi:10.1073/pnas.0800256105
Article ADS Google Scholar
Shen Y, Delaglio F, Cornilescu G, Bax A (2009) TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J Biomol NMR 44:213–223. doi:10.1007/s10858-009-9333-z
Article Google Scholar
van den Bedem H, Fraser JS (2015) Integrative, dynamic structural biology at atomic resolution-it’s about time. Nat Methods 12:307–318. doi:10.1038/nmeth.3324
Article Google Scholar
van der Schot G, Zhang Z, Vernon R et al (2013) Improving 3D structure prediction from chemical shift data. J Biomol NMR 57:27–35. doi:10.1007/s10858-013-9762-6
Article Google Scholar
Wassenaar TA, van Dijk M, Loureiro-Ferreira N et al (2012) WeNMR: structural biology on the grid. J Grid Comput 10:743–767. doi:10.1007/s10723-012-9246-z
Article Google Scholar
Wishart DS, Arndt D, Berjanskii M et al (2008) CS23D: a web server for rapid protein structure generation using NMR chemical shifts and sequence data. Nucl Acids Res 36:W496–W502. doi:10.1093/nar/gkn305
Article Google Scholar
Wu B, Skarina T, Yee A et al (2010) NleG Type 3 effectors from enterohaemorrhagic Escherichia coli are U-Box E3 ubiquitin ligases. PLoS Pathog 6:e1000960. doi:10.1371/journal.ppat.1000960
Article Google Scholar

Download references

Acknowledgments

The WeNMR project (European FP7 e-Infrastructure grant, Contract No 261572, www.wenmr.eu), supported by the European Grid Initiative (EGI) through the national GRID Initiatives of Belgium, France, Italy, Germany, the Netherlands, Poland, Portugal, Spain, UK, South Africa, China Malaysia, Taiwan, the Latin America GRID infrastructure via the Gisela project, the International Desktop Grid Federation (IDGF) with its volunteers and the US Open Science Grid (OSG) are acknowledged for the use of web portals, computing and storage facilities.

Author information

Gijs van der Schot
Present address: Laboratory of Molecular Biophysics, Department of Cell and Molecular Biology, Uppsala University, Husargatan 3, Box 596, 75 124, Uppsala, Sweden

Authors and Affiliations

Faculty of Science – Chemistry, Bijvoet Center for Biomolecular Research, Utrecht University, Utrecht, The Netherlands
Gijs van der Schot & Alexandre M. J. J. Bonvin

Authors

Gijs van der Schot
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre M. J. J. Bonvin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexandre M. J. J. Bonvin.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

van der Schot, G., Bonvin, A.M.J.J. Performance of the WeNMR CS-Rosetta3 web server in CASD-NMR. J Biomol NMR 62, 497–502 (2015). https://doi.org/10.1007/s10858-015-9942-7

Download citation

Received: 11 March 2015
Accepted: 28 April 2015
Published: 17 May 2015
Issue Date: August 2015
DOI: https://doi.org/10.1007/s10858-015-9942-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Performance of the WeNMR CS-Rosetta3 web server in CASD-NMR

Abstract

Similar content being viewed by others

Analysis of the structural quality of the CASD-NMR 2013 entries

Improved reliability, accuracy and quality in automated NMR structure calculation with ARIA

NMR data-driven structure determination using NMR-I-TASSER in the CASD-NMR experiment

Introduction