Automated error-tolerant macromolecular structure determination from multidimensional nuclear Overhauser enhancement spectra and chemical shift assignments: improved robustness and performance of the PASD algorithm

Kuszewski, John J.; Thottungal, Robin Augustine; Clore, G. Marius; Schwieters, Charles D.

doi:10.1007/s10858-008-9255-1

Automated error-tolerant macromolecular structure determination from multidimensional nuclear Overhauser enhancement spectra and chemical shift assignments: improved robustness and performance of the PASD algorithm

Article
Published: 31 July 2008

Volume 41, pages 221–239, (2008)
Cite this article

Journal of Biomolecular NMR Aims and scope Submit manuscript

John J. Kuszewski¹,
Robin Augustine Thottungal¹,
G. Marius Clore² &
…
Charles D. Schwieters¹

284 Accesses
16 Citations
3 Altmetric
Explore all metrics

Abstract

We report substantial improvements to the previously introduced automated NOE assignment and structure determination protocol known as PASD (Kuszewski et al. (2004) J Am Chem Soc 26:6258–6273). The improved protocol includes extensive analysis of input spectral data to create a low-resolution contact map of residues expected to be close in space. This map is used to obtain reasonable initial guesses of NOE assignment likelihoods which are refined during subsequent structure calculations. Information in the contact map about which residues are predicted to not be close in space is applied via conservative repulsive distance restraints which are used in early phases of the structure calculations. In comparison with the previous protocol, the new protocol requires significantly less computation time. We show results of running the new PASD protocol on six proteins and demonstrate that useful assignment and structural information is extracted on proteins of more than 220 residues. We show that useful assignment information can be obtained even in the case in which a unique structure cannot be determined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Notes

Target values and widths (θ_t and θ_w, respectively) for this dihedral potential are calculated as $\theta_t = \frac{1}{2}(\theta_{\rm min} + \theta_{\rm max})$ and $\theta_w = \hbox{max}[\frac{1}{2}(\theta_{\rm max}-\theta_{\rm min})+5^\circ , 20^\circ],$ where θ_min and θ_max are, respectively, the maximum and minimum values of the torsion angle among TALOS’s database matches for a given residue.

References

Bartels C, Xia TH, Billeter M, Güntert P, Wüthrich K (1995) The program XEASY for computer-supported NMR spectral analysis of biological macromolecules. J Biomol NMR 6:1–10
Article Google Scholar
Bax A, Kontaxis G, Tjandra N (2001) Dipolar couplings in macromolecular structure determination. Meth Enzymol 339:127–174
Article Google Scholar
Busam RD, Lehtio L, Arrowsmith CH, Collins R, Dahlgren LG, Edwards AM, Flodin S, Flores A, Graslund S, Hammarstrom M, Hallberg BM, Herman MD, Johansson A, Johansson I, Kallas A, Karlberg T, Kotenyova T, Moche M, Nilsson ME, Nordlund P, Nyman T, Persson C, Sagemark J, Sundstrom M, Svensson L, Thorsell AG, Tresaugues L, Van den Berg S, Weigelt J, Welin M, Berglund H Crystal structure of human thiamine triphosphatase. To be Published
Bewley CA, Gustafson KR, Boyd MR, Covell DG, Bax A, Clore GM, Gronenborn AM (1998) Solution structure of cyanovirin-N, a potent HIV-inactivating protein. Nat Struct Biol 5:571–578
Article Google Scholar
Billeter M, Braun W, Wüthrich K (1982) Sequential resonance assignments in protein ¹H nuclear magnetic resonance spectra: computation of sterically allowed proton-proton distances and statistical analysis of proton-proton distances in single crystal protein conformations. J Mol Biol 155:321–346
Article Google Scholar
BMRB NMR-STAR Data Dictionary (2004) http://www.bmrb.wisc.edu/dictionary/htmldocs/nmr_star/dictionary.html
Brüschweiler R, Blackledge M, Ernst RR (1991) Multi-conformational peptide dynamics derived from NMR data: a new search algorithm and its application to antamanide. J Biomol NMR 1:13–11
Article Google Scholar
Cavalli A, Salvatella X, Dobson CM, Vendruscolo M (2007) Protein structure determination from NMR chemical shifts. Proc Natl Acad Sci USA 104:9615–9620
Article ADS Google Scholar
Clore GM, Gronenborn AM (1989) Determination of three-dimensional structures of proteins and nucleic acids in solution by nuclear magnetic resonance spectroscopy. Crit Rev Biochem Mol Biol 24:479–564
Article Google Scholar
Clore GM, Gronenborn AM (1991a) Applications of three- and four-dimensional heteronuclear NMR spectroscopy to protein structure determination. Progr Nucl Magn Reson Spectrosc 23:43–92
Article Google Scholar
Clore GM, Gronenborn AM (1991b) Two, three and four dimensional NMR methods for obtaining larger and more precise three-dimensional structures of proteins in solution. Ann Rev Biophys Biophys Chem 20:29–63
Article Google Scholar
Clore GM, Kuszewski J (2002) χ₁ Rotamer populations and angles of mobile surface side chains are accurately predicted by a torsion angle database potential of mean force. J Am Chem Soc 124:2866–2867
Article Google Scholar
Clore GM, Nilges M, Sukuraman DK, Brünger AT, Karplus M, Gronenborn AM (1986) The three-dimensional structure of α1-purothionin in solution: combined use of nuclear magnetic resonance, distance geometry and restrained molecular dynamics. EMBO J 5:2729–2735
Google Scholar
Clore GM, Gronenborn AM, Nilges M, Ryan CA (1987) The three-dimensional structure of potato carboxypeptidase inhibitor in solution: a study using nuclear magnetic resonance, distance geometry and restrained molecular dynamics. Biochemistry 26:8012–8023
Article Google Scholar
Cornilescu G, Delaglio F, Bax A (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13:289–302
Article Google Scholar
Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, Bax A (1995) NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J Biomol NMR 6:277–293
Article Google Scholar
de Vlieg J, Boelens R, Scheek RM, Kaptein R, van Gunsteren WF (1986) Restrained molecular dynamics procedure for protein tertiary structure determination from NMR data: a lac repressor headpiece structure based on information on J-coupling and from presence and absence of NOEs. Isr J Chem 27:181–188
Google Scholar
Garrett DS, Powers R, Gronenborn AM, Clore GM (1991) A common sense approach to peak picking two-, three- and four-dimensional spectra using automatic computer analysis of contour diagrams. J Magn Reson 95:214–220
Google Scholar
Garrett DS, Seok Y-J, Liao DT, Peterkofsky A, Gronenborn AM, Clore GM (1997) Solution structure of the 30 kDa N-terminal domain of enzyme I of the Escherichia coli phosphoenolpyruvate:sugar phosphotransferase system by multidimensional NMR. Biochemistry 36:2517–2530
Article Google Scholar
Goto NK, Gardner KH, Mueller GA, Willis RC, Kay LE (1999) A robust and cost-effective method for the production of Val, Leu, Ile (δ1) methyl-protonated¹⁵N-,¹³C-,²H-labeled proteins. J Biomol NMR 13:369–374
Article Google Scholar
Grishaev A, Llinás M (2002) CLOUDS, a protocol for deriving a molecular proton density via NMR. Proc Natl Acad Sci USA 99:6707–6712
Article ADS Google Scholar
Grishaev A, Wu J, Trewheela J, Bax A (2005) Refinement of multidomain structures by combination of solution small-angle X-ray scattering and NMR data. J Am Chem Soc 127:16621–16628
Article Google Scholar
Güntert P (2003) Automated NMR protein structure calculation. Prog Nucl Magn Reson Spectrosc 43:105–125
Article Google Scholar
Herrmann T, Güntert P, Wüthrich K (2002a) Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J Mol Biol 319:209–227
Article Google Scholar
Herrmann T, Güntert P, Wüthrich K (2002b) Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J Biomol NMR 24:171–189
Article Google Scholar
Huang YJ, Tejero R, Powers R, Montelione GT (2006) A topology-constrained distance network algorithm for protein structure determination from NOESY data. Prot Struct Funct Bioinf 62:587–603
Article Google Scholar
Kuszewski J, Gronenborn AM, Clore GM (1996) Improving the quality of NMR and crystallographic protein structures by means of a conformational database potential derived from structure databases. Protein Sci 5:1067–1080
Article Google Scholar
Kuszewski J, Schwieters CD, Garrett DS, Byrd RA, Tjandra N, Clore GM (2004) Completely automated, highly error-tolerant macromolecular structure determination from multidimensional nuclear overhauser enhancement spectra and chemical shift assignments. J Am Chem Soc 26:6258–6273
Article Google Scholar
McFeeters RL, Altieri AS, Cherry S, Tropea JE, Waugh DS, Byrd RA (2007) The high-precision solution structure of Yersinia modulating protein YmoA provides insight into interaction with H-NS. Biochemistry 46:13975–13982
Article Google Scholar
Nilges M (1993) A calculation strategy for the solution structure determination of symmetric dimers by ¹H-NMR. Proteins 17:297–309
Article Google Scholar
Nilges M, Gronenborn AM, Brunger AT, Clore GM (1988) Determination of three-dimensional structures of proteins by simulated annealing with interproton distance restraints: application to crambin, potato carboxypeptidase inhibitor and barley serine proteinase inhibitor 2. Protein Eng 2:27–38
Article Google Scholar
Nilges M, Macias MJ, O’Donoghue SI, Oschkinat H (1997) Automated NOESY interpretation with ambiguous distance restraints: the refined NMR solution structure of the Pleckstrin homology domain from β-spectrin. J Mol Biol 269:408–422
Article Google Scholar
Powers R, Garrett DS, March CJ, Frieden EA, Gronenborn AM, Clore GM (1993) The high-resolution, three-dimensional solution structure of human interleukin-4 determined by multidimensional heteronuclear magnetic resonance spectroscopy. Biochemistry 32:6744–6762
Article Google Scholar
Ramelot TA, Cort JR, Yee AA, Guido V, Lukin JA, Arrowsmith CH, Kennedy MA. To be published
Rieping W, Habeck M, Bardiaux, Bernard A, Malliavin TE, Nilges M (2007) ARIA2: automated NOE assignment and data integration in NMR structure calculation. Bioinformatics 23:381–382
Schwieters CD, Clore GM (2001) Internal coordinates for molecular dynamics and minimization in structure determination and refinement. J Magn Reson 152:288–302
Article ADS Google Scholar
Schwieters CD, Clore GM (2007) A physical picture of atomic motions within the Dickerson DNA dodecamer in solution derived from joint ensemble refinement against NMR and large angle X-ray scattering data. Biochemistry 46:1152–1166
Article Google Scholar
Schwieters CD, Kuszewski JJ, Tjandra N, Clore GM (2003) The Xplor-NIH NMR molecular structure determination package. J Magn Reson 160:66–74
Article ADS Google Scholar
Schwieters CD, Kuszewski JJ, Clore GM (2006) Using Xplor-NIH for NMR molecular structure determination. Progr NMR Spectrosc 48:47–62
Article Google Scholar
Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, Liu G, Eletsky A, Wu Y, Singarapu KK, Lemak A, Ignatchenko A, Arrowsmith CH, Szyperski T, Montelione GT, Baker D, Bax A (2008) Consistent blind protein structure generation from NMR chemical shift data. Proc Natl Acad Sci USA 105:4685–4690
Article ADS Google Scholar
Song J, Bettendorff L, Tonelli M, Markley JL (2008) Structural basis for the catalytic mechanism of mammalian 25 kDa thiamine triphosphatase. J Biol Chem 283:10939–10948
Article Google Scholar
Summers MF, South TL, Kim B, Hare DR (1990) High-resolution structure of an HIV zinc fingerlike domain via a new NMR-based distance geometry approach. Biochemistry 29:329–340
Article Google Scholar
Tang C, Iwahara J, Clore GM (2005) Accurate determination of leucine and valine side-chain conformations using U-[¹⁵N/¹³C/²H]/[¹H-(methyl/methine)-Leu/Val] isotope labeling, NOE pattern recognition and methine CγHγ/CβHβ residual dipolar couplings: application to the 34 kDa enzyme IIA^Chitobiose. J Biomol NMR 33:105–121
Article Google Scholar
Theobald DL, Wuttke DS (2006a) Empirical Bayes hierarchical models for regularizing maximum likelihood estimation in the matrix Gaussian Procrustes problem. Proc Natl Acad Sci 103:18521–18527
Article MATH MathSciNet ADS Google Scholar
Theobald DL, Wuttke DS (2006b) THESEUS: Maximum likelihood superpositioning and analysis of macromolecular structures. Bioinformatics 22:2171–2172
Article Google Scholar
Tjandra N, Garrett DS, Gronenborn AM, Bax A, Clore GM (1997) Defining long range order in NMR structure determination from the dependence of heteronuclear relaxation times on rotational diffusion anisotropy. Nat Struct Biol 4:443–449
Article Google Scholar
Verlet L (1967) Computer “experiments” on classical fluids. I. Thermodynamical properties of Lennard-Jones molecules. Phys Rev 159:98–103
Article ADS Google Scholar
Wilcox GR, Fogh RH, Norton RS (1993) Refined structure in solution of the sea anemone neurotoxin ShI. J Biol Chem 268:24707–24719
Google Scholar
Wlodawer A, Pavlovsky A, Gustchina A (1992) Crystal structure of human recombinant interleukin-4 at 2.25 Å resolution. FEBS Lett 309:59–64
Article Google Scholar
Yee A, Chang X, Pineda-Lucena A, Wu B, Semesi A, Le B, Ramelot T, Lee GM, Bhattacharyya S, Gutierrez P, Denisov A, Lee CH, Cort JR, Kozlov G, Liao J, Finak G, Chen L, Wishart D, Lee W, McIntosh LP, Gehring K, Kennedy MA, Edwards AM, Arrowsmith CH (2002) An NMR approach to structural proteomics. Proc Natl Acad Sci USA 99:1825–1830
Article ADS Google Scholar

Download references

Acknowledgements

This work was supported by the CIT (to CDS) and NIDDK (to GMC) Intramural Research Programs of the NIH.

Author information

Authors and Affiliations

Imaging Sciences Laboratory, Center for Information Technology, National Institutes of Health, Building 12A, Bethesda, MD, 20892-5624, USA
John J. Kuszewski, Robin Augustine Thottungal & Charles D. Schwieters
Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Building 5, Bethesda, MD, 20892-0510, USA
G. Marius Clore

Authors

John J. Kuszewski
View author publications
You can also search for this author in PubMed Google Scholar
Robin Augustine Thottungal
View author publications
You can also search for this author in PubMed Google Scholar
G. Marius Clore
View author publications
You can also search for this author in PubMed Google Scholar
Charles D. Schwieters
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to G. Marius Clore or Charles D. Schwieters.

Electronic supplementary material

Below is the link to the electronic supplementary material.

10858_2008_9255_MOESM1_ESM.pdf

Appendix: Domain determination using a maximum likelihood fitting procedure

In order to fit subregions of structures which do not have high overall similarity, we have implemented a version of the maximum likelihood (ML) algorithm developed by Theobald and Wuttke (2006a) with a minor simplifying alteration which yields slightly improved results. In short, we have implemented the algorithm outlined in the supplementary material of Theobald and Wuttke (2006b) in which the following quantity is maximized:

$$ -\frac{1}{2}\sum_i^N || ({\mathbf{X}}_i+{\mathbf{1}}_K{\mathbf{t}}_i^T){\mathbf{R}}_i - {\mathbf{M}}||_{{\varvec{\Upsigma}}^{-1}}^2 - \frac{3N}{2}\ln |{\varvec{\Upsigma}}|, $$

(17)

where $|{\mathbf{U}}|$ denotes the determinant of matrix U and $||{\mathbf{A}}||_{\mathbf B} = \hbox{Tr } {\mathbf{A}}^T{\mathbf{BA}}.$ ${\mathbf{X}}_i$ is a K × 3 matrix of coordinates of the input structures, ${\mathbf{1}}_K$ is a K-dimensional vector with all elements set to one, ${\mathbf{R}}_i$ and ${\mathbf{t}}_i$ are, respectively, the rotation matrix and translation vector determined in the fitting process, while M corresponds to the average coordinates

$$ {\mathbf M} = \frac{1}{N} \sum_i^N {\mathbf X}_i{\mathbf R}_i. $$

(18)

${\varvec{\Upsigma}}$ is the K × K coordinate covariance matrix whose inverse weights the fit of the coordinates such that coordinates with larger variances do not contribute as much to the fit. If ${\varvec{\Upsigma}}$ is set to the identity matrix, Eq. 17 reduces to standard least squares coordinate fitting. Coordinate precision can be expressed in terms of ${\varvec{\Upsigma}}$ in a form analogous to the standard least squares RMSD:

$$ \hbox{RMSD}_{ML} = \sqrt{\frac{K}{Tr {\varvec{\Upsigma}}^{-1}}}. $$

(19)

Maximizing Eq. 17 balances two objectives: making structures as similar as possible to the mean, while minimizing the structure spread. The maximum likelihood estimate for the covariance matrix is

$$ {\varvec{\Upsigma}} = \frac{1}{3N} \sum_i^N [({\mathbf{X}}_i+{\mathbf{1}}_K{\mathbf{t}}_i^T){\mathbf{R}}_i-{\mathbf{M}}][ ({\mathbf{X}}_i+{\mathbf{1}}_K{\mathbf{t}}_i^T){\mathbf{R}}_i-{\mathbf{M}}]^ T $$

(20)

where the sum is over all structures to fit. Expressions for the ML estimates of ${\mathbf{R}}_i$ and the associated coordinate translation ${\mathbf{t}}_i$ can be found in Theobald and Wuttke (2006b). ML coordinate fitting is an iterative process since each structure’s translation and rotation depend on ${\varvec{\Upsigma}}$, which in turn depends on the translation and rotation. However, convergence typically occurs fairly rapidly (in fewer than 30 iterations).

Now, strictly speaking, ${\varvec{\Upsigma}}$ cannot be inverted because it always has zero eigenvalues due in part to invariance of overall translation and rotation. In Theobald and Wuttke (2006b) trial values of ${\varvec{\Upsigma}}$ are perturbed such that the eigenvalues obey an inverse gamma distribution. However, as the off-diagonal covariances are fairly meaningless (and hence not considered in their default algorithm), they resort to approximating the eigenvalues as the diagonal atomic variances. We find the whole procedure cumbersome and unwarranted, since the diagonal elements are poor estimates of the true eigenvalues. Instead we simply perturb ${\varvec{\Upsigma}}$ with a small value:

$$ {\varvec{\Upsigma}} \rightarrow {\varvec{\Upsigma}} + \varepsilon {\mathbb 1} $$

(21)

where ${\mathbb 1}$ is a K × K unit matrix and ɛ is a small value (typically 10⁻⁴). For multiple systems we find that this procedure works slightly better (converges in fewer iterations) and gives nearly identical fits to the method of Theobald and Wuttke (2006b).

In our iterative domain determination method we take the 50 structures of the second PASD structure calculation, fit them using this modified fitting procedure, and collect those atoms with a fit positional RMSD threshold less than ρ_thresh. We consider residues to be contiguous if their primary sequence difference is less than D _min, the number of residues in the smallest domain considered. If RMSD_ML < 1.5 Å we consider the selected atoms to be in a single domain. Otherwise, we repeat the procedure, considering only this subset of atoms, and we decrement ρ_thresh by Δρ_thresh. This process is repeated until the first domain is determined. Successive domains are determined by repeating the procedure, excluding the atoms in the previously determined domains. We use the parameters ρ_thresh = 3.5 Å, Δρ_thresh = 0.5 Å (decremented every other iteration), and D _min = 20 residues. For the ThTP domain determination it should be noted that the domain identification was found to be fairly insensitive to the RMSD threshold value. A script implementing this domain determination algorithm is now distributed with the Xplor-NIH package.

Glossary of terms and symbols

Active assignment: An NOE assignment which contributes to the linear (Pass 1) or quadratic (Pass 2) restraint terms. Whether an assignment is active or inactive is determined from its assignment likelihoods via the procedure described in Section “Determination of active peak assignments”.
Active peak: An NOE peak with one or more active assignments.
Assignment likelihood λ(i,j): The probability of the correctness of assignment j of peak i. λ_p is the previous likelihood of an assignment based on previously obtained information; in Pass 1 λ_p is denoted $\lambda_p^n$ and is based on the network contact map, while in Pass 2 previous likelihoods $\lambda_p^v$ are based on distance violations of the structures calculated in Pass 1. The violation likelihood λ_v is the probability of correctness of an assignment based on distance violations in the current structure. The overall peak assignment likelihood λ_o is a weighted average of previous and violation likelihoods. The assignment likelihood λ_a is used to determine which single assignment to use for a given peak during Pass 2.
Broad tolerance Δ_B: The size of chemical shift bins used in the initial assignment procedure. [Section “Shift assignment stripe correction”]
Calibration peak: NOE peaks corresponding to intraresidue or backbone sequential connectivities, used for stripe correction and network analysis. [Section “Shift assignment stripe correction”]
Characteristic violation distance Δr _c: Distance used in determining assignment likelihood λ_v. Smaller values reduce the likelihood of assignments with large violations. [Eq. 13]
Linear NOE potential E _lin: Energy term used in Pass 1 which is linear in NOE violation. [Eq. 6]
Network score R(a,b): The residue pair score between residues a and b, based on connectivities deduced from the initial collection of possible NOE assignments. R′(a,b) is the normalized score used for assigning initial likelihoods; associated assignments are specified as active for R′ > R _c. Larger R′ corresponds to a larger number of connections. [Eqs. 1 and 2]
Peak assignment: A specific NOE peak assignment relating a single peak to a pair of assigned chemical shifts.
Previous likelihood weight w _p: Weight determining the contribution of λ_p and λ_v to λ_o. [Eq. 14]
Quadratic NOE potential E _quad: Energy term used in Pass 2 which is quadratic in NOE violation. [Eq. 10]
Repulsive distance potential E _repul: Energy term used in Pass 1 which repels atoms associated with shift assignments which are inactive. [Eq. 11]
Stripe coverage C: The fraction of calibration peaks consistent with a particular chemical shift assignment. [Section “Shift assignment stripe correction”]
Symmetry partners: Two NOE peaks with from- and to- assignments reversed.
Tight tolerance Δ_T: The size of chemical shift bins used during peak assignment after the stripe correction procedure. [Section “Shift assignment stripe correction”]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuszewski, J.J., Thottungal, R.A., Clore, G.M. et al. Automated error-tolerant macromolecular structure determination from multidimensional nuclear Overhauser enhancement spectra and chemical shift assignments: improved robustness and performance of the PASD algorithm. J Biomol NMR 41, 221–239 (2008). https://doi.org/10.1007/s10858-008-9255-1

Download citation

Received: 09 May 2008
Accepted: 02 July 2008
Published: 31 July 2008
Issue Date: August 2008
DOI: https://doi.org/10.1007/s10858-008-9255-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automated error-tolerant macromolecular structure determination from multidimensional nuclear Overhauser enhancement spectra and chemical shift assignments: improved robustness and performance of the PASD algorithm

Abstract

Access this article

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

10858_2008_9255_MOESM1_ESM.pdf

Appendix: Domain determination using a maximum likelihood fitting procedure

Appendix: Domain determination using a maximum likelihood fitting procedure

Glossary of terms and symbols

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation