Advertisement

A Hybrid Approach for Protein Structure Determination Combining Sparse NMR with Evolutionary Coupling Sequence Data

  • Yuanpeng Janet Huang
  • Kelly P. Brock
  • Chris Sander
  • Debora S. Marks
  • Gaetano T. MontelioneEmail author
Chapter
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 1105)

Abstract

While 3D structure determination of small (<15 kDa) proteins by solution NMR is largely automated and routine, structural analysis of larger proteins is more challenging. An emerging hybrid strategy for modeling protein structures combines sparse NMR data that can be obtained for larger proteins with sequence co-variation data, called evolutionary couplings (ECs), obtained from multiple sequence alignments of protein families. This hybrid “EC-NMR” method can be used to accurately model larger (15–60 kDa) proteins, and more rapidly determine structures of smaller (5–15 kDa) proteins using only backbone NMR data. The resulting structures have accuracies relative to reference structures comparable to those obtained with full backbone and sidechain NMR resonance assignments. The requirement that evolutionary couplings (ECs) are consistent with NMR data recorded on a specific member of a protein family, under specific conditions, potentially also allows identification of ECs that reflect alternative allosteric or excited states of the protein structure.

Keywords

Hybrid methods Protein NMR spectroscopy Protein families Multiple sequence alignment Maximum entropy Evolutionary couplings Automated NMR data analysis AutoStructure/ASDP 

Notes

Acknowledgements

This work was supported by National Institutes of Health grants 1R01-GM120574 (to G.T.M.) and 1R01-GM106303 (C.S. & D.M.).We thank all of the members of the Northeast Structural Genomics Consortium who generated and archived NMR data used in this work, particularly scientists in the laboratories of C. Arrowsmith, M. Kennedy, G.T. Montelione, T. Szyperski, and J. Prestegard.

References

  1. Anishchenko I, Ovchinnikov S, Kamisetty H, Baker D (2017) Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci U S A 114(34):9122–9127.  https://doi.org/10.1073/pnas.1702664114 CrossRefPubMedPubMedCentralGoogle Scholar
  2. Braun T, Koehler Leman J, Lange OF (2015) Combining evolutionary information and an iterative sampling strategy for accurate protein structure prediction. PLoS Comput Biol 11(12):e1004661.  https://doi.org/10.1371/journal.pcbi.1004661 CrossRefPubMedPubMedCentralGoogle Scholar
  3. Cheng RR, Morcos F, Levine H, Onuchic JN (2014) Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci U S A 111(5):E563–E571.  https://doi.org/10.1073/pnas.1323734111 CrossRefPubMedPubMedCentralGoogle Scholar
  4. dos Santos RN, Morcos F, Jana B, Andricopulo AD, Onuchic JN (2015) Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 5:13652.  https://doi.org/10.1038/srep13652 CrossRefPubMedPubMedCentralGoogle Scholar
  5. Ekeberg M, Lovkvist C, Lan Y, Weigt M, Aurell E (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlinear Soft Matter Phys 359 87(1):012707CrossRefGoogle Scholar
  6. Evenas J, Tugarinov V, Skrynnikov NR, Goto NK, Muhandiram R, Kay LE (2001) Ligand-induced structural changes to maltodextrin-binding protein as studied by solution NMR spectroscopy. J Mol Biol 309(4):961–974.  https://doi.org/10.1006/jmbi.2001.4695 CrossRefPubMedGoogle Scholar
  7. Everett JK, Tejero R, Murthy SB, Acton TB, Aramini JM, Baran MC, Benach J, Cort JR, Eletsky A, Forouhar F, Guan R, Kuzin AP, Lee HW, Liu G, Mani R, Mao B, Mills JL, Montelione AF, Pederson K, Powers R, Ramelot T, Rossi P, Seetharaman J, Snyder D, Swapna GV, Vorobiev SM, Wu Y, Xiao R, Yang Y, Arrowsmith CH, Hunt JF, Kennedy MA, Prestegard JH, Szyperski T, Tong L, Montelione GT (2016) A community resource of experimental data for NMR / X-ray crystal structure pairs. Protein Sci 25(1):30–45.  https://doi.org/10.1002/pro.2774 CrossRefPubMedGoogle Scholar
  8. Gardner KH, Rosen MK, Kay LE (1997) Global folds of highly deuterated, methyl-protonated proteins by multidimensional NMR. Biochemistry 36(6):1389–1401CrossRefGoogle Scholar
  9. Gobel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins 18(4):309–317.  https://doi.org/10.1002/prot.340180402 CrossRefPubMedGoogle Scholar
  10. Grishaev A, Tugarinov V, Kay LE, Trewhella J, Bax A (2008) Refined solution structure of the 82-kDa enzyme malate synthase G from joint NMR and synchrotron SAXS restraints. J Biomol NMR 40(2):95–106.  https://doi.org/10.1007/s10858-007-9211-5 CrossRefPubMedGoogle Scholar
  11. Herrmann T, Güntert P, Wuthrich K (2002) Protein NMR structure determination with automated NOE assignment using the new software CANDID and the torsion angle dynamics algorithm DYANA. J Mol Biol 319(1):209–227CrossRefGoogle Scholar
  12. Hiller S, Garces RG, Malia TJ, Orekhov VY, Colombini M, Wagner G (2008) Solution structure of the integral human membrane protein VDAC-1 in detergent micelles. Science 321(5893):1206–1210.  https://doi.org/10.1126/science.1161302 CrossRefPubMedPubMedCentralGoogle Scholar
  13. Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149(7):1607–1621.  https://doi.org/10.1016/j.cell.2012.04.012 CrossRefPubMedPubMedCentralGoogle Scholar
  14. Hopf TA, Scharfe CP, Rodrigues JP, Green AG, Sander C, Bonvin AM, Marks DS (2014) Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3:e03430.  https://doi.org/10.7554/eLife.03430 CrossRefPubMedCentralGoogle Scholar
  15. Huang YJ, Tejero R, Powers R, Montelione GT (2006) A topology-constrained distance network algorithm for protein structure determination from NOESY data. Proteins 62(3):587–603.  https://doi.org/10.1002/prot.20820 CrossRefPubMedGoogle Scholar
  16. Jones DT, Buchan DW, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28(2):184–190.  https://doi.org/10.1093/bioinformatics/btr638 CrossRefPubMedGoogle Scholar
  17. Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Mei Ono A, Güntert P (2006) Optimal isotope labelling for NMR protein structure determinations. Nature 440(7080):52–57.  https://doi.org/10.1038/nature04525 CrossRefPubMedPubMedCentralGoogle Scholar
  18. Kamisetty H, Ovchinnikov S, Baker D (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci U S A 110(39):15674–15679.  https://doi.org/10.1073/pnas.1314045110 CrossRefPubMedPubMedCentralGoogle Scholar
  19. Lange OF, Baker D (2012) Resolution-adapted recombination of structural features significantly improves sampling in restraint-guided structure calculation. Proteins 80(3):884–895CrossRefGoogle Scholar
  20. Lange OF, Rossi P, Sgourakis NG, Song Y, Lee HW, Aramini JM, Ertekin A, Xiao R, Acton TB, Montelione GT, Baker D (2012) Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples. Proc Natl Acad Sci U S A 109(27):10873–10878.  https://doi.org/10.1073/pnas.1203013109 CrossRefPubMedPubMedCentralGoogle Scholar
  21. Lapedes A, Giraud B, Jarzynski C (2002) Using sequence alignments to predict protein structure and stability with high accuracy. National Laboratory Report LA-UR-02-4481. http://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-02-4481 and arXiv:1207.2484 [q-bio.QM] (2012 copy)
  22. Mao B, Guan R, Montelione GT (2011) Improved technologies now routinely provide protein NMR structures useful for molecular replacement. Structure 19(6):757–766.  https://doi.org/10.1016/j.str.2011.04.005 CrossRefPubMedPubMedCentralGoogle Scholar
  23. Mao B, Tejero R, Baker D, Montelione GT (2014) Protein NMR structures refined with Rosetta have higher accuracy relative to corresponding X-ray crystal structures. J Am Chem Soc 136(5):1893–1906.  https://doi.org/10.1021/ja409845w CrossRefPubMedPubMedCentralGoogle Scholar
  24. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One 6(12):e28766.  https://doi.org/10.1371/journal.pone.0028766 CrossRefPubMedPubMedCentralGoogle Scholar
  25. Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30(11):1072–1080.  https://doi.org/10.1038/nbt.2419 CrossRefPubMedPubMedCentralGoogle Scholar
  26. Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A (2014) PconsFold: improved contact predictions improve protein models. Bioinformatics 30(17):i482–i488.  https://doi.org/10.1093/bioinformatics/btu458 CrossRefPubMedPubMedCentralGoogle Scholar
  27. Montelione GT, Nilges M, Bax A, Güntert P, Herrmann T, Richardson JS, Schwieters CD, Vranken WF, Vuister GW, Wishart DS, Berman HM, Kleywegt GJ, Markley JL (2013) Recommendations of the wwPDB NMR validation task force. Structure 21(9):1563–1570.  https://doi.org/10.1016/j.str.2013.07.021 CrossRefPubMedGoogle Scholar
  28. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A 108(49):E1293–E1301.  https://doi.org/10.1073/pnas.1111471108 CrossRefPubMedPubMedCentralGoogle Scholar
  29. Morcos F, Jana B, Hwa T, Onuchic JN (2013) Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A 110(51):20533–20538.  https://doi.org/10.1073/pnas.1315625110 CrossRefPubMedPubMedCentralGoogle Scholar
  30. Mueller GA, Choy WY, Yang D, Forman-Kay JD, Venters RA, Kay LE (2000) Global folds of proteins with low densities of NOEs using residual dipolar couplings: application to the 370-residue maltodextrin-binding protein. J Mol Biol 300(1):197–212.  https://doi.org/10.1006/jmbi.2000.3842 CrossRefPubMedGoogle Scholar
  31. Neher E (1994) How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci U S A 91(1):98–102CrossRefGoogle Scholar
  32. Ovchinnikov S, Kamisetty H, Baker D (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 3:e02030.  https://doi.org/10.7554/eLife.02030 CrossRefPubMedPubMedCentralGoogle Scholar
  33. Ovchinnikov S, Kinch L, Park H, Liao Y, Pei J, Kim DE, Kamisetty H, Grishin NV, Baker D (2015) Large-scale determination of previously unsolved protein structures using evolutionary information. elife 4:e09248.  https://doi.org/10.7554/eLife.09248 CrossRefPubMedPubMedCentralGoogle Scholar
  34. Ovchinnikov S, Kim DE, Wang RY, Liu Y, DiMaio F, Baker D (2016) Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins 84(Suppl 1):67–75.  https://doi.org/10.1002/prot.24974 CrossRefPubMedPubMedCentralGoogle Scholar
  35. Ovchinnikov S, Park H, Varghese N, Huang PS, Pavlopoulos GA, Kim DE, Kamisetty H, Kyrpides NC, Baker D (2017) Protein structure determination using metagenome sequence data. Science 355(6322):294–298.  https://doi.org/10.1126/science.aah4043 CrossRefPubMedPubMedCentralGoogle Scholar
  36. Raman S, Lange OF, Rossi P, Tyka M, Wang X, Aramini J, Liu G, Ramelot TA, Eletsky A, Szyperski T, Kennedy MA, Prestegard J, Montelione GT, Baker D (2010) NMR structure determination for larger proteins using backbone-only data. Science 327(5968):1014–1018.  https://doi.org/10.1126/science.1183649 CrossRefPubMedPubMedCentralGoogle Scholar
  37. Rosen MK, Gardner KH, Willis RC, Parris WE, Pawson T, Kay LE (1996) Selective methyl group protonation of perdeuterated proteins. J Mol Biol 263(5):627–636CrossRefGoogle Scholar
  38. Sander C, Schneider R (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9(1):56–68.  https://doi.org/10.1002/prot.340090107 CrossRefPubMedGoogle Scholar
  39. Sgourakis NG, Natarajan K, Ying J, Vogeli B, Boyd LF, Margulies DH, Bax A (2014) The structure of mouse cytomegalovirus m04 protein obtained from sparse NMR data reveals a conserved fold of the m02-m06 viral immune modulator family. Structure 22(9):1263–1273.  https://doi.org/10.1016/j.str.2014.05.018 CrossRefPubMedPubMedCentralGoogle Scholar
  40. Shen Y, Bax A (2015) Protein structural information derived from NMR chemical shift with the neural network program TALOS-N. Methods Mol Biol 1260:17–32.  https://doi.org/10.1007/978-1-4939-2239-0_2 CrossRefPubMedPubMedCentralGoogle Scholar
  41. Sheridan R, Fieldhouse RJ, Hayat S, Sun Y, Antipin Y, Yang L, Hopf T, Marks DS, Sander C (2015) EVfold.org: evolutionary couplings and protein 3D structure prediction. bioRxiv 021022.  https://doi.org/10.1101/021022
  42. Shindyalov IN, Kolchanov NA, Sander C (1994) Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 7(3):349–358CrossRefGoogle Scholar
  43. Simkovic F, Ovchinnikov S, Baker D, Rigden DJ (2017) Applications of contact predictions to structural biology. IUCrJ 4(Pt 3):291–300.  https://doi.org/10.1107/S2052252517005115 CrossRefPubMedPubMedCentralGoogle Scholar
  44. Sulkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN (2012) Genomics-aided structure prediction. Proc Natl Acad Sci U S A 109(26):10340–10345.  https://doi.org/10.1073/pnas.1207864109 CrossRefPubMedPubMedCentralGoogle Scholar
  45. Tang Y, Huang YJ, Hopf TA, Sander C, Marks DS, Montelione GT (2015) Protein structure determination by combining sparse NMR data with evolutionary couplings. Nat Methods 12(8):751–754.  https://doi.org/10.1038/nmeth.3455 CrossRefPubMedPubMedCentralGoogle Scholar
  46. Taylor WR, Hatrick K (1994) Compensating changes in protein multiple sequence alignments. Protein Eng 7(3):341–348CrossRefGoogle Scholar
  47. Tejero R, Snyder D, Mao B, Aramini JM, Montelione GT (2013) PDBStat: a universal restraint converter and restraint analysis software package for protein NMR. J Biomol NMR 56(4):337–351.  https://doi.org/10.1007/s10858-013-9753-7 CrossRefPubMedPubMedCentralGoogle Scholar
  48. Thomas DJ, Casari G, Sander C (1996) The prediction of protein contacts from multiple sequence alignments. Protein Eng 9(11):941–948CrossRefGoogle Scholar
  49. Toth-Petroczy A, Palmedo P, Ingraham J, Hopf TA, Berger B, Sander C, Marks DS (2016) Structured states of disordered proteins from genomic sequences. Cell 167(1):158–170 e112.  https://doi.org/10.1016/j.cell.2016.09.010 CrossRefGoogle Scholar
  50. Tugarinov V, Choy WY, Orekhov VY, Kay LE (2005) Solution NMR-derived global fold of a monomeric 82-kDa enzyme. Proc Natl Acad Sci U S A 102(3):622–627.  https://doi.org/10.1073/pnas.0407792102 CrossRefPubMedPubMedCentralGoogle Scholar
  51. Tugarinov V, Kanelis V, Kay LE (2006) Isotope labeling strategies for the study of high-molecular-weight proteins by solution NMR spectroscopy. Nat Protoc 1(2):749–754.  https://doi.org/10.1038/nprot.2006.101 CrossRefPubMedGoogle Scholar
  52. Weinreb C, Riesselman AJ, Ingraham JB, Gross T, Sander C, Marks DS (2016) 3D RNA and functional interactions from evolutionary couplings. Cell 165(4):963–975.  https://doi.org/10.1016/j.cell.2016.03.030 CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Yuanpeng Janet Huang
    • 1
  • Kelly P. Brock
    • 3
  • Chris Sander
    • 2
    • 3
  • Debora S. Marks
    • 4
  • Gaetano T. Montelione
    • 1
    Email author
  1. 1.Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and BiochemistryRutgers, The State University of New JerseyPiscatawayUSA
  2. 2.Department of Cell BiologyHarvard Medical SchoolBostonUSA
  3. 3.cBio CenterDana-Farber Cancer InstituteBostonUSA
  4. 4.Department of Systems BiologyHarvard Medical SchoolBostonUSA

Personalised recommendations