A Hybrid Approach for Protein Structure Determination Combining Sparse NMR with Evolutionary Coupling Sequence Data

  • Yuanpeng Janet Huang
  • Kelly P. Brock
  • Chris Sander
  • Debora S. Marks
  • Gaetano T. MontelioneEmail author
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 1105)


While 3D structure determination of small (<15 kDa) proteins by solution NMR is largely automated and routine, structural analysis of larger proteins is more challenging. An emerging hybrid strategy for modeling protein structures combines sparse NMR data that can be obtained for larger proteins with sequence co-variation data, called evolutionary couplings (ECs), obtained from multiple sequence alignments of protein families. This hybrid “EC-NMR” method can be used to accurately model larger (15–60 kDa) proteins, and more rapidly determine structures of smaller (5–15 kDa) proteins using only backbone NMR data. The resulting structures have accuracies relative to reference structures comparable to those obtained with full backbone and sidechain NMR resonance assignments. The requirement that evolutionary couplings (ECs) are consistent with NMR data recorded on a specific member of a protein family, under specific conditions, potentially also allows identification of ECs that reflect alternative allosteric or excited states of the protein structure.


Hybrid methods Protein NMR spectroscopy Protein families Multiple sequence alignment Maximum entropy Evolutionary couplings Automated NMR data analysis AutoStructure/ASDP 



This work was supported by National Institutes of Health grants 1R01-GM120574 (to G.T.M.) and 1R01-GM106303 (C.S. & D.M.).We thank all of the members of the Northeast Structural Genomics Consortium who generated and archived NMR data used in this work, particularly scientists in the laboratories of C. Arrowsmith, M. Kennedy, G.T. Montelione, T. Szyperski, and J. Prestegard.


  1. Anishchenko I, Ovchinnikov S, Kamisetty H, Baker D (2017) Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci U S A 114(34):9122–9127. CrossRefPubMedPubMedCentralGoogle Scholar
  2. Braun T, Koehler Leman J, Lange OF (2015) Combining evolutionary information and an iterative sampling strategy for accurate protein structure prediction. PLoS Comput Biol 11(12):e1004661. CrossRefPubMedPubMedCentralGoogle Scholar
  3. Cheng RR, Morcos F, Levine H, Onuchic JN (2014) Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci U S A 111(5):E563–E571. CrossRefPubMedPubMedCentralGoogle Scholar
  4. dos Santos RN, Morcos F, Jana B, Andricopulo AD, Onuchic JN (2015) Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 5:13652. CrossRefPubMedPubMedCentralGoogle Scholar
  5. Ekeberg M, Lovkvist C, Lan Y, Weigt M, Aurell E (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlinear Soft Matter Phys 359 87(1):012707CrossRefGoogle Scholar
  6. Evenas J, Tugarinov V, Skrynnikov NR, Goto NK, Muhandiram R, Kay LE (2001) Ligand-induced structural changes to maltodextrin-binding protein as studied by solution NMR spectroscopy. J Mol Biol 309(4):961–974. CrossRefPubMedGoogle Scholar
  7. Everett JK, Tejero R, Murthy SB, Acton TB, Aramini JM, Baran MC, Benach J, Cort JR, Eletsky A, Forouhar F, Guan R, Kuzin AP, Lee HW, Liu G, Mani R, Mao B, Mills JL, Montelione AF, Pederson K, Powers R, Ramelot T, Rossi P, Seetharaman J, Snyder D, Swapna GV, Vorobiev SM, Wu Y, Xiao R, Yang Y, Arrowsmith CH, Hunt JF, Kennedy MA, Prestegard JH, Szyperski T, Tong L, Montelione GT (2016) A community resource of experimental data for NMR / X-ray crystal structure pairs. Protein Sci 25(1):30–45. CrossRefPubMedGoogle Scholar
  8. Gardner KH, Rosen MK, Kay LE (1997) Global folds of highly deuterated, methyl-protonated proteins by multidimensional NMR. Biochemistry 36(6):1389–1401CrossRefGoogle Scholar
  9. Gobel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins 18(4):309–317. CrossRefPubMedGoogle Scholar
  10. Grishaev A, Tugarinov V, Kay LE, Trewhella J, Bax A (2008) Refined solution structure of the 82-kDa enzyme malate synthase G from joint NMR and synchrotron SAXS restraints. J Biomol NMR 40(2):95–106. CrossRefPubMedGoogle Scholar
  11. Herrmann T, Güntert P, Wuthrich K (2002) Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J Mol Biol 319(1):209–227CrossRefGoogle Scholar
  12. Hiller S, Garces RG, Malia TJ, Orekhov VY, Colombini M, Wagner G (2008) Solution structure of the integral human membrane protein VDAC-1 in detergent micelles. Science 321(5893):1206–1210. CrossRefPubMedPubMedCentralGoogle Scholar
  13. Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149(7):1607–1621. CrossRefPubMedPubMedCentralGoogle Scholar
  14. Hopf TA, Scharfe CP, Rodrigues JP, Green AG, Sander C, Bonvin AM, Marks DS (2014) Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3:e03430. CrossRefPubMedCentralGoogle Scholar
  15. Huang YJ, Tejero R, Powers R, Montelione GT (2006) A topology-constrained distance network algorithm for protein structure determination from NOESY data. Proteins 62(3):587–603. CrossRefPubMedGoogle Scholar
  16. Jones DT, Buchan DW, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28(2):184–190. CrossRefPubMedGoogle Scholar
  17. Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Mei Ono A, Güntert P (2006) Optimal isotope labelling for NMR protein structure determinations. Nature 440(7080):52–57. CrossRefPubMedPubMedCentralGoogle Scholar
  18. Kamisetty H, Ovchinnikov S, Baker D (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci U S A 110(39):15674–15679. CrossRefPubMedPubMedCentralGoogle Scholar
  19. Lange OF, Baker D (2012) Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 80(3):884–895CrossRefGoogle Scholar
  20. Lange OF, Rossi P, Sgourakis NG, Song Y, Lee HW, Aramini JM, Ertekin A, Xiao R, Acton TB, Montelione GT, Baker D (2012) Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples. Proc Natl Acad Sci U S A 109(27):10873–10878. CrossRefPubMedPubMedCentralGoogle Scholar
  21. Lapedes A, Giraud B, Jarzynski C (2002) Using sequence alignments to predict protein structure and stability with high accuracy. National Laboratory Report LA-UR-02-4481. and arXiv:1207.2484 [q-bio.QM] (2012 copy)
  22. Mao B, Guan R, Montelione GT (2011) Improved technologies now routinely provide protein NMR structures useful for molecular replacement. Structure 19(6):757–766. CrossRefPubMedPubMedCentralGoogle Scholar
  23. Mao B, Tejero R, Baker D, Montelione GT (2014) Protein NMR structures refined with Rosetta have higher accuracy relative to corresponding X-ray crystal structures. J Am Chem Soc 136(5):1893–1906. CrossRefPubMedPubMedCentralGoogle Scholar
  24. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One 6(12):e28766. CrossRefPubMedPubMedCentralGoogle Scholar
  25. Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30(11):1072–1080. CrossRefPubMedPubMedCentralGoogle Scholar
  26. Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A (2014) PconsFold: improved contact predictions improve protein models. Bioinformatics 30(17):i482–i488. CrossRefPubMedPubMedCentralGoogle Scholar
  27. Montelione GT, Nilges M, Bax A, Güntert P, Herrmann T, Richardson JS, Schwieters CD, Vranken WF, Vuister GW, Wishart DS, Berman HM, Kleywegt GJ, Markley JL (2013) Recommendations of the wwPDB NMR validation task force. Structure 21(9):1563–1570. CrossRefPubMedGoogle Scholar
  28. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A 108(49):E1293–E1301. CrossRefPubMedPubMedCentralGoogle Scholar
  29. Morcos F, Jana B, Hwa T, Onuchic JN (2013) Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A 110(51):20533–20538. CrossRefPubMedPubMedCentralGoogle Scholar
  30. Mueller GA, Choy WY, Yang D, Forman-Kay JD, Venters RA, Kay LE (2000) Global folds of proteins with low densities of NOEs using residual dipolar couplings: application to the 370-residue maltodextrin-binding protein. J Mol Biol 300(1):197–212. CrossRefPubMedGoogle Scholar
  31. Neher E (1994) How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci U S A 91(1):98–102CrossRefGoogle Scholar
  32. Ovchinnikov S, Kamisetty H, Baker D (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 3:e02030. CrossRefPubMedPubMedCentralGoogle Scholar
  33. Ovchinnikov S, Kinch L, Park H, Liao Y, Pei J, Kim DE, Kamisetty H, Grishin NV, Baker D (2015) Large-scale determination of previously unsolved protein structures using evolutionary information. elife 4:e09248. CrossRefPubMedPubMedCentralGoogle Scholar
  34. Ovchinnikov S, Kim DE, Wang RY, Liu Y, DiMaio F, Baker D (2016) Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins 84(Suppl 1):67–75. CrossRefPubMedPubMedCentralGoogle Scholar
  35. Ovchinnikov S, Park H, Varghese N, Huang PS, Pavlopoulos GA, Kim DE, Kamisetty H, Kyrpides NC, Baker D (2017) Protein structure determination using metagenome sequence data. Science 355(6322):294–298. CrossRefPubMedPubMedCentralGoogle Scholar
  36. Raman S, Lange OF, Rossi P, Tyka M, Wang X, Aramini J, Liu G, Ramelot TA, Eletsky A, Szyperski T, Kennedy MA, Prestegard J, Montelione GT, Baker D (2010) NMR structure determination for larger proteins using backbone-only data. Science 327(5968):1014–1018. CrossRefPubMedPubMedCentralGoogle Scholar
  37. Rosen MK, Gardner KH, Willis RC, Parris WE, Pawson T, Kay LE (1996) Selective methyl group protonation of perdeuterated proteins. J Mol Biol 263(5):627–636CrossRefGoogle Scholar
  38. Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9(1):56–68. CrossRefPubMedGoogle Scholar
  39. Sgourakis NG, Natarajan K, Ying J, Vogeli B, Boyd LF, Margulies DH, Bax A (2014) The structure of mouse cytomegalovirus m04 protein obtained from sparse NMR data reveals a conserved fold of the m02-m06 viral immune modulator family. Structure 22(9):1263–1273. CrossRefPubMedPubMedCentralGoogle Scholar
  40. Shen Y, Bax A (2015) Protein structural information derived from NMR chemical shift with the neural network program TALOS-N. Methods Mol Biol 1260:17–32. CrossRefPubMedPubMedCentralGoogle Scholar
  41. Sheridan R, Fieldhouse RJ, Hayat S, Sun Y, Antipin Y, Yang L, Hopf T, Marks DS, Sander C (2015) evolutionary couplings and protein 3D structure prediction. bioRxiv 021022.
  42. Shindyalov IN, Kolchanov NA, Sander C (1994) Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 7(3):349–358CrossRefGoogle Scholar
  43. Simkovic F, Ovchinnikov S, Baker D, Rigden DJ (2017) Applications of contact predictions to structural biology. IUCrJ 4(Pt 3):291–300. CrossRefPubMedPubMedCentralGoogle Scholar
  44. Sulkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN (2012) Genomics-aided structure prediction. Proc Natl Acad Sci U S A 109(26):10340–10345. CrossRefPubMedPubMedCentralGoogle Scholar
  45. Tang Y, Huang YJ, Hopf TA, Sander C, Marks DS, Montelione GT (2015) Protein structure determination by combining sparse NMR data with evolutionary couplings. Nat Methods 12(8):751–754. CrossRefPubMedPubMedCentralGoogle Scholar
  46. Taylor WR, Hatrick K (1994) Compensating changes in protein multiple sequence alignments. Protein Eng 7(3):341–348CrossRefGoogle Scholar
  47. Tejero R, Snyder D, Mao B, Aramini JM, Montelione GT (2013) PDBStat: a universal restraint converter and restraint analysis software package for protein NMR. J Biomol NMR 56(4):337–351. CrossRefPubMedPubMedCentralGoogle Scholar
  48. Thomas DJ, Casari G, Sander C (1996) The prediction of protein contacts from multiple sequence alignments. Protein Eng 9(11):941–948CrossRefGoogle Scholar
  49. Toth-Petroczy A, Palmedo P, Ingraham J, Hopf TA, Berger B, Sander C, Marks DS (2016) Structured states of disordered proteins from genomic sequences. Cell 167(1):158–170 e112. CrossRefGoogle Scholar
  50. Tugarinov V, Choy WY, Orekhov VY, Kay LE (2005) Solution NMR-derived global fold of a monomeric 82-kDa enzyme. Proc Natl Acad Sci U S A 102(3):622–627. CrossRefPubMedPubMedCentralGoogle Scholar
  51. Tugarinov V, Kanelis V, Kay LE (2006) Isotope labeling strategies for the study of high-molecular-weight proteins by solution NMR spectroscopy. Nat Protoc 1(2):749–754. CrossRefPubMedGoogle Scholar
  52. Weinreb C, Riesselman AJ, Ingraham JB, Gross T, Sander C, Marks DS (2016) 3D RNA and functional interactions from evolutionary couplings. Cell 165(4):963–975. CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Yuanpeng Janet Huang
    • 1
  • Kelly P. Brock
    • 3
  • Chris Sander
    • 2
    • 3
  • Debora S. Marks
    • 4
  • Gaetano T. Montelione
    • 1
    Email author
  1. 1.Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and BiochemistryRutgers, The State University of New JerseyPiscatawayUSA
  2. 2.Department of Cell BiologyHarvard Medical SchoolBostonUSA
  3. 3.cBio CenterDana-Farber Cancer InstituteBostonUSA
  4. 4.Department of Systems BiologyHarvard Medical SchoolBostonUSA

Personalised recommendations