Skip to main content

Advertisement

Log in

Automated error-tolerant macromolecular structure determination from multidimensional nuclear Overhauser enhancement spectra and chemical shift assignments: improved robustness and performance of the PASD algorithm

  • Article
  • Published:
Journal of Biomolecular NMR Aims and scope Submit manuscript

Abstract

We report substantial improvements to the previously introduced automated NOE assignment and structure determination protocol known as PASD (Kuszewski et al. (2004) J Am Chem Soc 26:6258–6273). The improved protocol includes extensive analysis of input spectral data to create a low-resolution contact map of residues expected to be close in space. This map is used to obtain reasonable initial guesses of NOE assignment likelihoods which are refined during subsequent structure calculations. Information in the contact map about which residues are predicted to not be close in space is applied via conservative repulsive distance restraints which are used in early phases of the structure calculations. In comparison with the previous protocol, the new protocol requires significantly less computation time. We show results of running the new PASD protocol on six proteins and demonstrate that useful assignment and structural information is extracted on proteins of more than 220 residues. We show that useful assignment information can be obtained even in the case in which a unique structure cannot be determined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. Target values and widths (θ t and θ w , respectively) for this dihedral potential are calculated as \(\theta_t = \frac{1}{2}(\theta_{\rm min} + \theta_{\rm max})\) and \(\theta_w = \hbox{max}[\frac{1}{2}(\theta_{\rm max}-\theta_{\rm min})+5^\circ , 20^\circ],\) where θmin and θmax are, respectively, the maximum and minimum values of the torsion angle among TALOS’s database matches for a given residue.

References

  • Bartels C, Xia TH, Billeter M, Güntert P, Wüthrich K (1995) The program XEASY for computer-supported NMR spectral analysis of biological macromolecules. J Biomol NMR 6:1–10

    Article  Google Scholar 

  • Bax A, Kontaxis G, Tjandra N (2001) Dipolar couplings in macromolecular structure determination. Meth Enzymol 339:127–174

    Article  Google Scholar 

  • Busam RD, Lehtio L, Arrowsmith CH, Collins R, Dahlgren LG, Edwards AM, Flodin S, Flores A, Graslund S, Hammarstrom M, Hallberg BM, Herman MD, Johansson A, Johansson I, Kallas A, Karlberg T, Kotenyova T, Moche M, Nilsson ME, Nordlund P, Nyman T, Persson C, Sagemark J, Sundstrom M, Svensson L, Thorsell AG, Tresaugues L, Van den Berg S, Weigelt J, Welin M, Berglund H Crystal structure of human thiamine triphosphatase. To be Published

  • Bewley CA, Gustafson KR, Boyd MR, Covell DG, Bax A, Clore GM, Gronenborn AM (1998) Solution structure of cyanovirin-N, a potent HIV-inactivating protein. Nat Struct Biol 5:571–578

    Article  Google Scholar 

  • Billeter M, Braun W, Wüthrich K (1982) Sequential resonance assignments in protein 1H nuclear magnetic resonance spectra: computation of sterically allowed proton-proton distances and statistical analysis of proton-proton distances in single crystal protein conformations. J Mol Biol 155:321–346

    Article  Google Scholar 

  • BMRB NMR-STAR Data Dictionary (2004) http://www.bmrb.wisc.edu/dictionary/htmldocs/nmr_star/dictionary.html

  • Brüschweiler R, Blackledge M, Ernst RR (1991) Multi-conformational peptide dynamics derived from NMR data: a new search algorithm and its application to antamanide. J Biomol NMR 1:13–11

    Article  Google Scholar 

  • Cavalli A, Salvatella X, Dobson CM, Vendruscolo M (2007) Protein structure determination from NMR chemical shifts. Proc Natl Acad Sci USA 104:9615–9620

    Article  ADS  Google Scholar 

  • Clore GM, Gronenborn AM (1989) Determination of three-dimensional structures of proteins and nucleic acids in solution by nuclear magnetic resonance spectroscopy. Crit Rev Biochem Mol Biol 24:479–564

    Article  Google Scholar 

  • Clore GM, Gronenborn AM (1991a) Applications of three- and four-dimensional heteronuclear NMR spectroscopy to protein structure determination. Progr Nucl Magn Reson Spectrosc 23:43–92

    Article  Google Scholar 

  • Clore GM, Gronenborn AM (1991b) Two, three and four dimensional NMR methods for obtaining larger and more precise three-dimensional structures of proteins in solution. Ann Rev Biophys Biophys Chem 20:29–63

    Article  Google Scholar 

  • Clore GM, Kuszewski J (2002) χ1 Rotamer populations and angles of mobile surface side chains are accurately predicted by a torsion angle database potential of mean force. J Am Chem Soc 124:2866–2867

    Article  Google Scholar 

  • Clore GM, Nilges M, Sukuraman DK, Brünger AT, Karplus M, Gronenborn AM (1986) The three-dimensional structure of α1-purothionin in solution: combined use of nuclear magnetic resonance, distance geometry and restrained molecular dynamics. EMBO J 5:2729–2735

    Google Scholar 

  • Clore GM, Gronenborn AM, Nilges M, Ryan CA (1987) The three-dimensional structure of potato carboxypeptidase inhibitor in solution: a study using nuclear magnetic resonance, distance geometry and restrained molecular dynamics. Biochemistry 26:8012–8023

    Article  Google Scholar 

  • Cornilescu G, Delaglio F, Bax A (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13:289–302

    Article  Google Scholar 

  • Delaglio F, Grzesiek S, Vuister GW, Zhu G, Pfeifer J, Bax A (1995) NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J Biomol NMR 6:277–293

    Article  Google Scholar 

  • de Vlieg J, Boelens R, Scheek RM, Kaptein R, van Gunsteren WF (1986) Restrained molecular dynamics procedure for protein tertiary structure determination from NMR data: a lac repressor headpiece structure based on information on J-coupling and from presence and absence of NOEs. Isr J Chem 27:181–188

    Google Scholar 

  • Garrett DS, Powers R, Gronenborn AM, Clore GM (1991) A common sense approach to peak picking two-, three- and four-dimensional spectra using automatic computer analysis of contour diagrams. J Magn Reson 95:214–220

    Google Scholar 

  • Garrett DS, Seok Y-J, Liao DT, Peterkofsky A, Gronenborn AM, Clore GM (1997) Solution structure of the 30 kDa N-terminal domain of enzyme I of the Escherichia coli phosphoenolpyruvate:sugar phosphotransferase system by multidimensional NMR. Biochemistry 36:2517–2530

    Article  Google Scholar 

  • Goto NK, Gardner KH, Mueller GA, Willis RC, Kay LE (1999) A robust and cost-effective method for the production of Val, Leu, Ile (δ1) methyl-protonated 15N-, 13C-, 2H-labeled proteins. J Biomol NMR 13:369–374

    Article  Google Scholar 

  • Grishaev A, Llinás M (2002) CLOUDS, a protocol for deriving a molecular proton density via NMR. Proc Natl Acad Sci USA 99:6707–6712

    Article  ADS  Google Scholar 

  • Grishaev A, Wu J, Trewheela J, Bax A (2005) Refinement of multidomain structures by combination of solution small-angle X-ray scattering and NMR data. J Am Chem Soc 127:16621–16628

    Article  Google Scholar 

  • Güntert P (2003) Automated NMR protein structure calculation. Prog Nucl Magn Reson Spectrosc 43:105–125

    Article  Google Scholar 

  • Herrmann T, Güntert P, Wüthrich K (2002a) Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J Mol Biol 319:209–227

    Article  Google Scholar 

  • Herrmann T, Güntert P, Wüthrich K (2002b) Protein NMR structure determination with automated NOE-identification in the NOESY spectra using the new software ATNOS. J Biomol NMR 24:171–189

    Article  Google Scholar 

  • Huang YJ, Tejero R, Powers R, Montelione GT (2006) A topology-constrained distance network algorithm for protein structure determination from NOESY data. Prot Struct Funct Bioinf 62:587–603

    Article  Google Scholar 

  • Kuszewski J, Gronenborn AM, Clore GM (1996) Improving the quality of NMR and crystallographic protein structures by means of a conformational database potential derived from structure databases. Protein Sci 5:1067–1080

    Article  Google Scholar 

  • Kuszewski J, Schwieters CD, Garrett DS, Byrd RA, Tjandra N, Clore GM (2004) Completely automated, highly error-tolerant macromolecular structure determination from multidimensional nuclear overhauser enhancement spectra and chemical shift assignments. J Am Chem Soc 26:6258–6273

    Article  Google Scholar 

  • McFeeters RL, Altieri AS, Cherry S, Tropea JE, Waugh DS, Byrd RA (2007) The high-precision solution structure of Yersinia modulating protein YmoA provides insight into interaction with H-NS. Biochemistry 46:13975–13982

    Article  Google Scholar 

  • Nilges M (1993) A calculation strategy for the solution structure determination of symmetric dimers by 1H-NMR. Proteins 17:297–309

    Article  Google Scholar 

  • Nilges M, Gronenborn AM, Brunger AT, Clore GM (1988) Determination of three-dimensional structures of proteins by simulated annealing with interproton distance restraints: application to crambin, potato carboxypeptidase inhibitor and barley serine proteinase inhibitor 2. Protein Eng 2:27–38

    Article  Google Scholar 

  • Nilges M, Macias MJ, O’Donoghue SI, Oschkinat H (1997) Automated NOESY interpretation with ambiguous distance restraints: the refined NMR solution structure of the Pleckstrin homology domain from β-spectrin. J Mol Biol 269:408–422

    Article  Google Scholar 

  • Powers R, Garrett DS, March CJ, Frieden EA, Gronenborn AM, Clore GM (1993) The high-resolution, three-dimensional solution structure of human interleukin-4 determined by multidimensional heteronuclear magnetic resonance spectroscopy. Biochemistry 32:6744–6762

    Article  Google Scholar 

  • Ramelot TA, Cort JR, Yee AA, Guido V, Lukin JA, Arrowsmith CH, Kennedy MA. To be published

  • Rieping W, Habeck M, Bardiaux, Bernard A, Malliavin TE, Nilges M (2007) ARIA2: automated NOE assignment and data integration in NMR structure calculation. Bioinformatics 23:381–382

  • Schwieters CD, Clore GM (2001) Internal coordinates for molecular dynamics and minimization in structure determination and refinement. J Magn Reson 152:288–302

    Article  ADS  Google Scholar 

  • Schwieters CD, Clore GM (2007) A physical picture of atomic motions within the Dickerson DNA dodecamer in solution derived from joint ensemble refinement against NMR and large angle X-ray scattering data. Biochemistry 46:1152–1166

    Article  Google Scholar 

  • Schwieters CD, Kuszewski JJ, Tjandra N, Clore GM (2003) The Xplor-NIH NMR molecular structure determination package. J Magn Reson 160:66–74

    Article  ADS  Google Scholar 

  • Schwieters CD, Kuszewski JJ, Clore GM (2006) Using Xplor-NIH for NMR molecular structure determination. Progr NMR Spectrosc 48:47–62

    Article  Google Scholar 

  • Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, Liu G, Eletsky A, Wu Y, Singarapu KK, Lemak A, Ignatchenko A, Arrowsmith CH, Szyperski T, Montelione GT, Baker D, Bax A (2008) Consistent blind protein structure generation from NMR chemical shift data. Proc Natl Acad Sci USA 105:4685–4690

    Article  ADS  Google Scholar 

  • Song J, Bettendorff L, Tonelli M, Markley JL (2008) Structural basis for the catalytic mechanism of mammalian 25 kDa thiamine triphosphatase. J Biol Chem 283:10939–10948

    Article  Google Scholar 

  • Summers MF, South TL, Kim B, Hare DR (1990) High-resolution structure of an HIV zinc fingerlike domain via a new NMR-based distance geometry approach. Biochemistry 29:329–340

    Article  Google Scholar 

  • Tang C, Iwahara J, Clore GM (2005) Accurate determination of leucine and valine side-chain conformations using U-[15N/13C/2H]/[1H-(methyl/methine)-Leu/Val] isotope labeling, NOE pattern recognition and methine CγHγ/CβHβ residual dipolar couplings: application to the 34 kDa enzyme IIAChitobiose. J Biomol NMR 33:105–121

    Article  Google Scholar 

  • Theobald DL, Wuttke DS (2006a) Empirical Bayes hierarchical models for regularizing maximum likelihood estimation in the matrix Gaussian Procrustes problem. Proc Natl Acad Sci 103:18521–18527

    Article  MATH  MathSciNet  ADS  Google Scholar 

  • Theobald DL, Wuttke DS (2006b) THESEUS: Maximum likelihood superpositioning and analysis of macromolecular structures. Bioinformatics 22:2171–2172

    Article  Google Scholar 

  • Tjandra N, Garrett DS, Gronenborn AM, Bax A, Clore GM (1997) Defining long range order in NMR structure determination from the dependence of heteronuclear relaxation times on rotational diffusion anisotropy. Nat Struct Biol 4:443–449

    Article  Google Scholar 

  • Verlet L (1967) Computer “experiments” on classical fluids. I. Thermodynamical properties of Lennard-Jones molecules. Phys Rev 159:98–103

    Article  ADS  Google Scholar 

  • Wilcox GR, Fogh RH, Norton RS (1993) Refined structure in solution of the sea anemone neurotoxin ShI. J Biol Chem 268:24707–24719

    Google Scholar 

  • Wlodawer A, Pavlovsky A, Gustchina A (1992) Crystal structure of human recombinant interleukin-4 at 2.25 Å resolution. FEBS Lett 309:59–64

    Article  Google Scholar 

  • Yee A, Chang X, Pineda-Lucena A, Wu B, Semesi A, Le B, Ramelot T, Lee GM, Bhattacharyya S, Gutierrez P, Denisov A, Lee CH, Cort JR, Kozlov G, Liao J, Finak G, Chen L, Wishart D, Lee W, McIntosh LP, Gehring K, Kennedy MA, Edwards AM, Arrowsmith CH (2002) An NMR approach to structural proteomics. Proc Natl Acad Sci USA 99:1825–1830

    Article  ADS  Google Scholar 

Download references

Acknowledgements

This work was supported by the CIT (to CDS) and NIDDK (to GMC) Intramural Research Programs of the NIH.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to G. Marius Clore or Charles D. Schwieters.

Electronic supplementary material

Below is the link to the electronic supplementary material.

10858_2008_9255_MOESM1_ESM.pdf

Appendix: Domain determination using a maximum likelihood fitting procedure

Appendix: Domain determination using a maximum likelihood fitting procedure

In order to fit subregions of structures which do not have high overall similarity, we have implemented a version of the maximum likelihood (ML) algorithm developed by Theobald and Wuttke (2006a) with a minor simplifying alteration which yields slightly improved results. In short, we have implemented the algorithm outlined in the supplementary material of Theobald and Wuttke (2006b) in which the following quantity is maximized:

$$ -\frac{1}{2}\sum_i^N || ({\mathbf{X}}_i+{\mathbf{1}}_K{\mathbf{t}}_i^T){\mathbf{R}}_i - {\mathbf{M}}||_{{\varvec{\Upsigma}}^{-1}}^2 - \frac{3N}{2}\ln |{\varvec{\Upsigma}}|, $$
(17)

where \(|{\mathbf{U}}|\) denotes the determinant of matrix U and \(||{\mathbf{A}}||_{\mathbf B} = \hbox{Tr } {\mathbf{A}}^T{\mathbf{BA}}.\) \({\mathbf{X}}_i\) is a K × 3 matrix of coordinates of the input structures, \({\mathbf{1}}_K\) is a K-dimensional vector with all elements set to one, \({\mathbf{R}}_i\) and \({\mathbf{t}}_i\) are, respectively, the rotation matrix and translation vector determined in the fitting process, while M corresponds to the average coordinates

$$ {\mathbf M} = \frac{1}{N} \sum_i^N {\mathbf X}_i{\mathbf R}_i. $$
(18)

\({\varvec{\Upsigma}}\) is the K × K coordinate covariance matrix whose inverse weights the fit of the coordinates such that coordinates with larger variances do not contribute as much to the fit. If \({\varvec{\Upsigma}}\) is set to the identity matrix, Eq. 17 reduces to standard least squares coordinate fitting. Coordinate precision can be expressed in terms of \({\varvec{\Upsigma}}\) in a form analogous to the standard least squares RMSD:

$$ \hbox{RMSD}_{ML} = \sqrt{\frac{K}{Tr {\varvec{\Upsigma}}^{-1}}}. $$
(19)

Maximizing Eq. 17 balances two objectives: making structures as similar as possible to the mean, while minimizing the structure spread. The maximum likelihood estimate for the covariance matrix is

$$ {\varvec{\Upsigma}} = \frac{1}{3N} \sum_i^N [({\mathbf{X}}_i+{\mathbf{1}}_K{\mathbf{t}}_i^T){\mathbf{R}}_i-{\mathbf{M}}][ ({\mathbf{X}}_i+{\mathbf{1}}_K{\mathbf{t}}_i^T){\mathbf{R}}_i-{\mathbf{M}}]^ T $$
(20)

where the sum is over all structures to fit. Expressions for the ML estimates of \({\mathbf{R}}_i\) and the associated coordinate translation \({\mathbf{t}}_i\) can be found in Theobald and Wuttke (2006b). ML coordinate fitting is an iterative process since each structure’s translation and rotation depend on \({\varvec{\Upsigma}}\), which in turn depends on the translation and rotation. However, convergence typically occurs fairly rapidly (in fewer than 30 iterations).

Now, strictly speaking, \({\varvec{\Upsigma}}\) cannot be inverted because it always has zero eigenvalues due in part to invariance of overall translation and rotation. In Theobald and Wuttke (2006b) trial values of \({\varvec{\Upsigma}}\) are perturbed such that the eigenvalues obey an inverse gamma distribution. However, as the off-diagonal covariances are fairly meaningless (and hence not considered in their default algorithm), they resort to approximating the eigenvalues as the diagonal atomic variances. We find the whole procedure cumbersome and unwarranted, since the diagonal elements are poor estimates of the true eigenvalues. Instead we simply perturb \({\varvec{\Upsigma}}\) with a small value:

$$ {\varvec{\Upsigma}} \rightarrow {\varvec{\Upsigma}} + \varepsilon {\mathbb 1} $$
(21)

where \({\mathbb 1}\) is a K × K unit matrix and ɛ is a small value (typically 10−4). For multiple systems we find that this procedure works slightly better (converges in fewer iterations) and gives nearly identical fits to the method of Theobald and Wuttke (2006b).

In our iterative domain determination method we take the 50 structures of the second PASD structure calculation, fit them using this modified fitting procedure, and collect those atoms with a fit positional RMSD threshold less than ρthresh. We consider residues to be contiguous if their primary sequence difference is less than D min, the number of residues in the smallest domain considered. If RMSD ML  < 1.5 Å we consider the selected atoms to be in a single domain. Otherwise, we repeat the procedure, considering only this subset of atoms, and we decrement ρthresh by Δρthresh. This process is repeated until the first domain is determined. Successive domains are determined by repeating the procedure, excluding the atoms in the previously determined domains. We use the parameters ρthresh = 3.5 Å, Δρthresh = 0.5 Å (decremented every other iteration), and D min = 20 residues. For the ThTP domain determination it should be noted that the domain identification was found to be fairly insensitive to the RMSD threshold value. A script implementing this domain determination algorithm is now distributed with the Xplor-NIH package.

Glossary of terms and symbols

Active assignment

An NOE assignment which contributes to the linear (Pass 1) or quadratic (Pass 2) restraint terms. Whether an assignment is active or inactive is determined from its assignment likelihoods via the procedure described in Section “Determination of active peak assignments”.

Active peak

An NOE peak with one or more active assignments.

Assignment likelihood λ(i,j)

The probability of the correctness of assignment j of peak i. λ p is the previous likelihood of an assignment based on previously obtained information; in Pass 1 λ p is denoted \(\lambda_p^n\) and is based on the network contact map, while in Pass 2 previous likelihoods \(\lambda_p^v\) are based on distance violations of the structures calculated in Pass 1. The violation likelihood λ v is the probability of correctness of an assignment based on distance violations in the current structure. The overall peak assignment likelihood λ o is a weighted average of previous and violation likelihoods. The assignment likelihood λ a is used to determine which single assignment to use for a given peak during Pass 2.

Broad tolerance Δ B

The size of chemical shift bins used in the initial assignment procedure. [Section “Shift assignment stripe correction”]

Calibration peak

NOE peaks corresponding to intraresidue or backbone sequential connectivities, used for stripe correction and network analysis. [Section “Shift assignment stripe correction”]

Characteristic violation distance Δr c

Distance used in determining assignment likelihood λ v . Smaller values reduce the likelihood of assignments with large violations. [Eq. 13]

Linear NOE potential E lin

Energy term used in Pass 1 which is linear in NOE violation. [Eq. 6]

Network score R(a,b)

The residue pair score between residues a and b, based on connectivities deduced from the initial collection of possible NOE assignments. R′(a,b) is the normalized score used for assigning initial likelihoods; associated assignments are specified as active for R′ > R c . Larger R′ corresponds to a larger number of connections. [Eqs. 1 and 2]

Peak assignment

A specific NOE peak assignment relating a single peak to a pair of assigned chemical shifts.

Previous likelihood weight w p

Weight determining the contribution of λ p and λ v to λ o . [Eq. 14]

Quadratic NOE potential E quad

Energy term used in Pass 2 which is quadratic in NOE violation. [Eq. 10]

Repulsive distance potential E repul

Energy term used in Pass 1 which repels atoms associated with shift assignments which are inactive. [Eq. 11]

Stripe coverage C

The fraction of calibration peaks consistent with a particular chemical shift assignment. [Section “Shift assignment stripe correction”]

Symmetry partners

Two NOE peaks with from- and to- assignments reversed.

Tight tolerance Δ T

The size of chemical shift bins used during peak assignment after the stripe correction procedure. [Section “Shift assignment stripe correction”]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuszewski, J.J., Thottungal, R.A., Clore, G.M. et al. Automated error-tolerant macromolecular structure determination from multidimensional nuclear Overhauser enhancement spectra and chemical shift assignments: improved robustness and performance of the PASD algorithm. J Biomol NMR 41, 221–239 (2008). https://doi.org/10.1007/s10858-008-9255-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10858-008-9255-1

Keywords

Navigation