Skip to main content

Structure-based protein NMR assignments using native structural ensembles

Abstract

An important step in NMR protein structure determination is the assignment of resonances and NOEs to corresponding nuclei. Structure-based assignment (SBA) uses a model structure (“template”) for the target protein to expedite this process. Nuclear vector replacement (NVR) is an SBA framework that combines multiple sources of NMR data (chemical shifts, RDCs, sparse NOEs, amide exchange rates, TOCSY) and has high accuracy when the template is close to the target protein’s structure (less than 2 Å backbone RMSD). However, a close template may not always be available. We extend the circle of convergence of NVR for distant templates by using an ensemble of structures. This ensemble corresponds to the low-frequency perturbations of the given template and is obtained using normal mode analysis (NMA). Our algorithm assigns resonances and sparse NOEs using each of the structures in the ensemble separately, and aggregates the results using a voting scheme based on maximum bipartite matching. Experimental results on human ubiquitin, using four distant template structures show an increase in the assignment accuracy. Our algorithm also improves the robustness of NVR with respect to structural noise. We provide a confidence measure for each assignment using the percentage of the structures that agree on that assignment. We use this measure to assign a subset of the peaks with even higher accuracy. We further validate our algorithm on data for two additional proteins with NVR. We then show the general applicability of our approach by applying our NMA ensemble-based voting scheme to another SBA tool, MARS. For three test proteins with corresponding templates, including the 370-residue maltose binding protein, we increase the number of reliable assignments made by MARS. Finally, we show that our voting scheme is sound and optimal, by proving that it is a maximum likelihood estimator of the correct assignments.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Abbreviations

bb RMSD:

Backbone root mean square distance

BPG:

Bipartite graph

CS:

Chemical shift

EIN:

N-terminal domain of enzyme I

EM:

Expectation-maximization

GαIP:

G-α interacting protein

HD:

Homology detection

MBM:

Maximum bipartite matching

MBP:

Maltose-binding protein

MLE:

Maximum likelihood estimator

MR:

Molecular replacement

NMA:

Normal mode analysis

NMR:

Nuclear magnetic resonance

NOE:

Nuclear overhauser effect

NVR:

Nuclear vector replacement

PR:

Pseudoresidue

RDC:

Residual dipolar coupling

SBA:

Structure-based assignment

SPG:

Streptococcal protein G

References

  • Al-Hashimi H, Gorin A, Majumdar A, Gosser Y, Patel D (2002) Towards structural genomics of RNA: rapid NMR resonance assignment and simultaneous RNA tertiary structure determination using residual dipolar couplings. J Mol Biol 318(3):637–649

    Article  Google Scholar 

  • Al-Hashimi H, Patel D (2002) Residual dipolar couplings: synergy between NMR and structural genomics. J Biomol NMR 22(1):1–8

    Article  Google Scholar 

  • Bahar I, Atılgan A, Erman B (1997) Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Fold Des 2(3):173–181

    Article  Google Scholar 

  • Bailey-Kellogg C, Chainraj S, Pandurangan G (2004) A random graph approach to NMR sequential assignment. In: RECOMB, San Diego, CA, pp 58–67

  • Best R, Vendruscolo M (2004) Determination of protein structures consistent with NMR order parameters. J Am Chem Soc 126(26):8090–8091

    Article  Google Scholar 

  • Conitzer V, Sandholm T (2005) Common voting rules as maximum likelihood estimators. In: Proceedings of the 21st annual conference on uncertainty in artificial intelligence (UAI-05), Edinburgh, Scotland, UK, pp 145–152

  • Cornilescu G, Marquardt J, Ottiger M, Bax A (1998) Validation of protein structure from anisotropic carbonyl chemical shifts in a dilute liquid crystalline phase. J Am Chem Soc 120(27):6836–6837

    Article  Google Scholar 

  • De Alba E, De Vries L, Farquhar MG, Tjandra N (1999) Solution structure of GaIP (galpha interacting protein): a regulator of G protein signaling. J Mol Biol 291(4):927

    Article  Google Scholar 

  • de Caritat (Marquis de Condorcet) MJAN (1785) Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. L’Imprimerie Royale, Paris

    Google Scholar 

  • Ferentz AE, Wagner G (2000) NMR spectroscopy: a multifaceted approach to macromolecular structure. Q Rev Biophys 33(1):29–65

    Article  Google Scholar 

  • Güntert P (2004) Automated NMR structure calculation with CYANA. Methods Mol Biol 278:353–378

    Google Scholar 

  • Harris R (2002) The ubiquitin NMR resource page, BBSRC Bloomsbury Center for Structural Biology, http://www.biochem.ucl.ac.uk/bsm/nmr/ubq/index.html. Cited 02 Jun 2007

  • Holm L, Sander C (1991) Database algorithm for generating protein backbone and side-chain coordinates from a c alpha trace application to model building and detection of coordinate errors. J Mol Biol 218(1):183–194

    Article  Google Scholar 

  • Hus J, Prompers J, Brüschweiler R (2002) Assignment strategy for proteins of known structure. J Mag Res 157(1):119–125

    Article  ADS  Google Scholar 

  • Jung Y-S, Zweckstetter M (2004a) Mars—robust automatic backbone assignment of proteins. http://www.mpibpc.mpg.de/groups/griesinger/zweckstetter/_links/software_mars.htm. Cited 02 Jun 2007

  • Jung Y, Zweckstetter M (2004b) Backbone assignment of proteins with known structure using residual dipolar couplings. J Biomol NMR 30(1):25–35

    Article  Google Scholar 

  • Jung Y, Zweckstetter M (2004c) Mars—robust automatic backbone assignment of proteins. J Biomol NMR 30(1):11–23

    Article  Google Scholar 

  • Kay L (1998) Protein dynamics from NMR. Nat Struct Biol 5(Suppl):513–517

    Article  Google Scholar 

  • Krebs W, Alexandrov V, Wilson C, Echols N, Yu H, Gerstein M (2002) Normal mode analysis of macromolecular motions in a database framework: developing mode concentration as a useful classifying statistic. Proteins Struct Funct Genet 48(4):682–695

    Article  Google Scholar 

  • Kuhn H (1955) The Hungarian method for the assignment problem. Nav Res Logist Quart 2:83–97

    Article  MathSciNet  Google Scholar 

  • Kuszewski J, Gronenborn AM, Clore GM (1999) Improving the packing and accuracy of NMR structures with a pseudopotential for the radius of gyration. J Am Chem Soc 121(10):2337–2338

    Article  Google Scholar 

  • Langmead C, Donald B (2004a) An expectation/maximization nuclear vector replacement algorithm for automated NMR resonance assignments. J Biomol NMR 29(2):111–138

    Article  Google Scholar 

  • Langmead C, Donald B (2004b) High-throughput 3D structural homology detection via NMR resonance assignment. In: Proc IEEE Comput Syst Bioinform Conf, Stanford, CA, pp 278–89. PMID: 16448021

  • Langmead C, Yan A, Lilien R, Wang L, Donald B (2003) A polynomial-time nuclear vector replacement algorithm for automated NMR resonance assignments. In: Proc the seventh annual international conference on research in computational molecular biology (RECOMB). ACM Press, Berlin, Germany, April 10–13, pp 176–187. Appears in: J Comput Biol 11(2–3):277–98 (2004)

  • Leo-Macias A, Lopez-Romero P, Lupyan D, Zerbino D, Ortiz A (2005) An analysis of core deformations in protein superfamilies. Biophys J 88(2):1291–1299

    Article  Google Scholar 

  • Meiler J, Baker D (2003) Rapid protein fold determination using unassigned NMR data. Proc Nat Acad Sci USA 100(26):15404–15409

    Article  ADS  Google Scholar 

  • Mumenthaler C, Güntert P, Braun W, Wüthrich K (1997) Automated combined assignment of NOESY spectra and three-dimensional protein structure determination. J Biomol NMR 10(4):351–362

    Article  Google Scholar 

  • Neal S, Nip AM, Zhang H, Wishart DS (2003) Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts. J Biomol NMR 26(3):215–240

    Article  Google Scholar 

  • Pearlman D, Case D, Caldwell J, Ross W, Cheatham T III, DeBolt S, Ferguson D, Seibel G, Kollman P (1995) Amber, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Comp Phys Commun 91(1–3):1–41

    Article  ADS  MATH  Google Scholar 

  • Potluri S, Yan A, Chou J, Donald B, Bailey-Kellogg C (2006) Structure determination of symmetric homo-oligomers by a complete search of symmetry configuration space, using NMR restraints and van der Waals packing. Proteins 65(1):203–219

    Article  Google Scholar 

  • Potluri S, Yan A, Donald B, Bailey-Kellogg C (2007) A complete algorithm to resolve ambiguity for inter-subunit NOE assignment in structure determination of symmetric homo-oligomers. Protein Sci 16(1):69–81

    Article  Google Scholar 

  • Rossman M, Blow D (1962) The detection of sub-units within the crystallographic assymetric unit. Acta Crystal (D) 15:24–31

    Article  Google Scholar 

  • Ruan K, Tolman JR (2005) Composite alignment media for the measurement of independent sets of NMR residual dipolar couplings. J Am Chem Soc 127(43):15032–15033

    Article  Google Scholar 

  • Sali A, Blundell T (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234(3):779–815

    Article  Google Scholar 

  • Seavey B, Farr E, Westler W, Markley J (1991) A relational database for sequence-specific protein NMR data. J Biomol NMR 1(3):217–236

    Article  Google Scholar 

  • Shehu A, Clementi C, Kavraki LE (2006) Modeling protein conformational ensembles: From missing loops to equilibrium fluctuations. Proteins Struct Funct Bioinform 65(1):164–179

    Article  Google Scholar 

  • Shindyalov I, Bourne P (1998) Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng 11(9):739–747

    Article  Google Scholar 

  • Suhre K, Sanejouand Y (2004a) Elnémo: a normal mode web-server for protein movement analysis and the generation of templates for molecular replacement. Nucleic Acids Res 32(1):W610–W614. http://www.igs.cnrs-mrs.fr/elnemo/. Cited 12 Jun 2007

  • Suhre K, Sanejouand Y (2004b) On the potential of normal mode analysis for solving difficult molecular replacement problems. Acta Crystal (D) 60(4):796–799

    Google Scholar 

  • Tjandra N, Bax A (1997) Direct measurement of distances and angles in biomolecules by NMR in a dilute liquid crystalline medium. Science 278(5340):1111–1114

    Article  ADS  Google Scholar 

  • Tolman JR, Flanagan JM, Kennedy MA, Prestegard JH (1995) Nuclear magnetic dipole interactions in field-oriented proteins: Information for structure determination in solution. Proc Natl Acad Sci USA 92(20):9279–9283

    Article  ADS  Google Scholar 

  • Vitek O, Bailey-Kellogg C, Craig B, Kuliniewicz P, Vitek J (2005) Reconsidering complete search algorithms for protein backbone NMR assignment. Bioinformatics 21(Suppl2):ii230–ii236

    Article  Google Scholar 

  • Wasserman L (2004) All of statistics: a concise course in statistical inference (Springer Texts in Statistics). Springer

  • Wells S, Menor S, Hespenheide B, Thorpe M (2005) Constrained geometric simulation of diffusive motion in proteins. Phys Biol 2(4):S127–S136

    Article  ADS  Google Scholar 

  • Xu XP, Case DA (2001) Automated prediction of 15N, 13Cα, 13Cβ and 13C chemical shifts in proteins using a density functional database. J Biomol NMR 21(4):321–333

    Article  Google Scholar 

  • Xu Y, Xu D, Kai D, Olman V, Razumovskaya J, Jiang T (2002) Automated assignment of backbone NMR peaks using constrained bipartite matching. Comput Sci Eng 4(1). Life Sci Div, Oak Ridge Nat Lab, TN

  • Young P (1995) Optimal voting rules. J Econ Perspect 9(1):51–64

    Google Scholar 

Download references

Acknowledgments

We thank Drs. C. Bailey-Kellogg, P. Zhou, Mr. D. Keedy, Mr. J. MacMaster, Mr. C. Tripathy, Mr. A. Yan, Mr. M. Zeng and all members of the Donald Lab for discussions and comments. This work is supported by a grant to B.R.D. from the National Institute of Health (R01 GM-65982).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Bruce Randall Donald.

Appendix

Appendix

Analysis of MBM as a voting rule

Motivation

In this section, we justify our use of MBM as a voting rule in SBA to combine the assignments for each structure in the NMA ensemble. To that end, we show that MBM is a maximum likelihood estimator (MLE). Maximum likelihood estimation is a general technique to estimate the unknown parameters of a distribution, given a set of observed data values derived from the distribution. It returns the parameters that maximize the likelihood of observing the set of data values.

Our proof demonstrates that our voting scheme is sound and optimal, by showing that our algorithm returns the assignment that maximizes the likelihood. In our case, the set of observed data values comprises the assignments for each model in our NMA ensemble, and the unknown parameters of the distribution are the unknown correct assignments. An MLE estimator has many desirable properties: In particular, it is consistent, which means that it converges to the true value of the estimated parameter (Wasserman 2004). This means that, as the number of models in our NMA ensemble increases, the assignments returned by our voting scheme converge to the correct assignments. This proof depends on our assumption that the assignments computed for each model are independent and identically distributed, according to our noise model (which is described below).

First, we formulate our algorithm as a voting scheme. In voting, there are multiple voters and multiple candidates. Each voter may vote for one (or a subset of) the candidates, or may rank the alternatives. In our setting, a vote is the resonance assignments for a structure in our NMA ensemble. Our voting scheme aggregates these preferences to compute “consensus” assignments, which are returned by our algorithm.

The idea of using MLE in voting was first proposed by de Caritat (Marquis de Condorcet) (1785), who analyzed 2- and 3-candidate elections; and was extended two centuries later to arbitrary number of candidates by Young (1995). However, none of the voting rules studied in these works corresponds to a widely-used voting rule. Conitzer and Sandholm (2005) then studied which of the well-known voting rules can be viewed as an MLE. For this purpose, they adopted the following model/assumptions: There exists an (unknown) ground truth winner (or ranking) of the election w, and each voter’s vote is a noisy measure of this ground truth. Due to noise, each voter’s vote may be different from the ground truth. The noise models the probability of observing a vote v i for voter i, given the ground truth winner w. The votes are independent given w, and identically distributed. Under these assumptions, given a set of votes v 1,…,v m , where m is the number of votes, a voting rule is an MLE of the correct winner w if it returns a winner w o that maximizes the likelihood of the observed votes. That is, it returns:

$$ \hbox{arg}\mathop{\hbox{max}}\limits_{w_o} p(v_1,v_2,\ldots,v_m|w_o) = \hbox{arg}\mathop{\hbox{max}}\limits_{w_o} {p(v_1|w_o) p(v_2|w_o) \ldots p(v_m|w_o)} $$

where p(v 1,v 2,…,v m |w i ) is the probability of observing v 1,v 2,…,v m if the (unknown) ground truth were w i .

Proof that MBM is an MLE

We now show that our voting rule, MBM, is the MLE of the correct assignments. In our setting, there is a ground truth winner, which is the correct (and unknown) (peak, residue) assignments. The individual votes correspond to the assignments made using each of the structures in the NMA ensemble separately using an SBA algorithm (Fig. 2). The MBM is done on a BPG where one set of nodes corresponds to peaks and the other set to residues. The edge weights are the number of structures that assign (“vote for”) the corresponding (peak, residue) pair.

We assume the following noise model: For each template in the ensemble (“voter”), each peak is correctly (resp., incorrectly) assigned with probability p (resp., q) where pq, independent of other peaks, and such that if the resulting assignments have more than one peak assigned to the same residue, each peak is reassigned (again with probability p to the correct residue and probability q to an incorrect residue). We further assume that the assignments corresponding to individual models are independent, given the correct assignments. So, the noise is independent and identically distributed.

With this noise model, the probability of a given assignment (vote) i in which k i of the n peaks are matched correctly is (proportional to) \(p^{k_i} q^{n-k_i}.\) The joint probability of all m votes corresponding to all m templates together is proportional to

$$ \begin{aligned} \prod_i p^{k_i} q^{n-k_i} =\,& p^a q^b \\ a =& \sum_i{k_i} \\ b =\,& nm - \sum_i{k_i} \\ \end{aligned} $$
(2)

where the product and the sums in (2) are from i = 1,…,m.

An MLE of the correct assignment chooses an assignment w o such that (2) is maximized. Fix a particular protein and its NMA ensemble, so that n and m are constants. Then, since p > q and nm is a constant, (2) is maximized when \(\sum_i k_i\) is maximized. \(\sum_i {k_i}\) is the number of times each (peak, residue) assignment (for each of the structural models) coincides with the (peak, residue) assignment in w o . This is maximized by MBM. Therefore MBM is an MLE of the correct assignment.\(\hfill\square\)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Apaydın, M.S., Conitzer, V. & Donald, B.R. Structure-based protein NMR assignments using native structural ensembles. J Biomol NMR 40, 263–276 (2008). https://doi.org/10.1007/s10858-008-9230-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10858-008-9230-x

Keywords

  • Automated NMR assignments
  • Normal mode analysis
  • NMR structural biology
  • Protein flexibility via structural ensembles
  • Structural bioinformatics