Skip to main content

Coevolutionary Signals and Structure-Based Models for the Prediction of Protein Native Conformations

  • Protocol
  • First Online:
Book cover Computational Methods in Protein Evolution

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1851))

Abstract

The analysis of coevolutionary signals from families of evolutionarily related sequences is a recent conceptual framework that provides valuable information about unique intramolecular interactions and, therefore, can assist in the elucidation of biomolecular conformations. It is based on the idea that compensatory mutations at specific residue positions in a sequence help preserve stability of protein architecture and function and leave a statistical signature related to residue-residue interactions in the 3D structure of the protein. Consequently, statistical analysis of these correlated mutations in subsets of protein sequence alignments can be used to predict which residue pairs should be in spatial proximity in the native functional protein fold. These predicted signals can be then used to guide molecular dynamics (MD) simulations to predict the three-dimensional coordinates of a functional amino acid chain. In this chapter, we introduce a general and efficient methodology to perform coevolutionary analysis on protein sequences and to use this information in combination with computational physical models to predict the native 3D conformation of functional polypeptides. We present a step-by-step methodology that includes the description and application of software tools and databases required to infer tertiary structures of a protein fold. The general pipeline includes instructions on (1) how to obtain direct amino acid couplings from protein sequences using direct coupling analysis (DCA), (2) how to incorporate such signals as interaction potentials in Cα structure-based models (SBMs) to drive protein-folding MD simulations, (3) a procedure to estimate secondary structure and how to include such estimates in the topology files required in the MD simulations, and (4) how to build full atomic models based on the top Cα candidates selected in the pipeline. The information presented in this chapter is self-contained and sufficient to allow a computational scientist to predict structures of proteins using publicly available algorithms and databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Morcos F, Pagnani A, Lunt B et al (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A 108:E1293–E1301

    Article  CAS  Google Scholar 

  2. Hamilton N, Burrage K, Ragan MA, Huber T (2004) Protein contact prediction using patterns of correlation. Proteins 56:679–684

    Article  CAS  Google Scholar 

  3. Ivankov DN, Finkelstein AV, Kondrashov FA (2014) A structural perspective of compensatory evolution. Curr Opin Struct Biol 26:104–112

    Article  CAS  Google Scholar 

  4. de Juan D, Pazos F, Valencia A (2013) Emerging methods in protein co-evolution. Nat Rev Genet 14:249–261

    Article  Google Scholar 

  5. Morcos F, Hwa T, Onuchic JN, Weigt M (2014) Direct coupling analysis for protein contact prediction. Methods Mol Biol 1137:55–70

    Article  CAS  Google Scholar 

  6. Sulkowska JI, Morcos F, Weigt M et al (2012) Genomics-aided structure prediction. Proc Natl Acad Sci 109:10340–10345

    Article  CAS  Google Scholar 

  7. Hopf TA, Colwell LJ, Sheridan R et al (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149:1607–1621

    Article  CAS  Google Scholar 

  8. Ovchinnikov S, Kamisetty H, Baker D (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. Elife 3:e02030

    Article  Google Scholar 

  9. Kamisetty H, Ovchinnikov S, Baker D (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci U S A 110:15674–15679

    Article  CAS  Google Scholar 

  10. Skwark MJ, Abdel-Rehim A, Elofsson A (2013) PconsC: combination of direct information methods and alignments improves contact prediction. Bioinformatics 29:1815–1816

    Article  CAS  Google Scholar 

  11. Ekeberg M, Lövkvist C, Lan Y et al (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlinear Soft Matter Phys 87:012707

    Article  Google Scholar 

  12. Hayat S, Sander C, Marks DS, Elofsson A (2015) All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences. Proc Natl Acad Sci U S A 112:5413–5418

    Article  CAS  Google Scholar 

  13. Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30:1072–1080

    Article  CAS  Google Scholar 

  14. Jones DT, Singh T, Kosciolek T, Tetchner S (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31:999–1006

    Article  CAS  Google Scholar 

  15. Sadowski MI, Taylor WR (2013) Prediction of protein contacts from correlated sequence substitutions. Sci Prog 96:33–42

    Article  CAS  Google Scholar 

  16. Hopf TA, Morinaga S, Ihara S et al (2015) Amino acid coevolution reveals three-dimensional structure and functional domains of insect odorant receptors. Nat Commun 6:6077

    Article  CAS  Google Scholar 

  17. Schug A, Weigt M, Onuchic JN et al (2009) High-resolution protein complexes from integrating genomic information with molecular simulation. Proc Natl Acad Sci U S A 106:22124–22129

    Article  CAS  Google Scholar 

  18. Tamir S, Rotem-Bamberger S, Katz C et al (2014) Integrated strategy reveals the protein interface between cancer targets Bcl-2 and NAF-1. Proc Natl Acad Sci U S A 111:5177–5182

    Article  CAS  Google Scholar 

  19. dos Santos RN, Morcos F, Jana B et al (2015) Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 5:13652

    Article  Google Scholar 

  20. Morcos F, Schafer NP, Cheng RR et al (2014) Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci U S A 111:12408–12413

    Article  CAS  Google Scholar 

  21. Mallik S, Kundu S (2015) Co-evolutionary constraints of globular proteins correlate with their folding rates. FEBS Lett 589:2179–2185

    Article  CAS  Google Scholar 

  22. Morcos F, Jana B, Hwa T, Onuchic JN (2013) Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A 110:20533–20538

    Article  CAS  Google Scholar 

  23. Sfriso P, Duran-Frigola M, Mosca R et al (2016) Residues coevolution guides the systematic identification of alternative functional conformations in proteins. Structure 24:116–126

    Article  CAS  Google Scholar 

  24. Cheng RR, Morcos F, Levine H, Onuchic JN (2014) Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci U S A 111:E563–E571

    Article  CAS  Google Scholar 

  25. Jana B, Morcos F, Onuchic JN (2014) From structure to function: the convergence of structure based models and co-evolutionary information. Phys Chem Chem Phys 16:6496–6507

    Article  CAS  Google Scholar 

  26. Noel JK, Levi M, Raghunathan M et al (2016) SMOG 2: a versatile software package for generating structure-based models. PLoS Comput Biol 12:e1004794

    Article  Google Scholar 

  27. Noel JK, Whitford PC, Sanbonmatsu KY, Onuchic JN (2010) SMOG@ctbp: simplified deployment of structure-based models in GROMACS. Nucleic Acids Res 38:W657–W661

    Article  CAS  Google Scholar 

  28. UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212

    Article  Google Scholar 

  29. Bateman A (2000) The Pfam protein families database. Nucleic Acids Res 28:263–266

    Article  CAS  Google Scholar 

  30. Finn RD, Coggill P, Eberhardt RY et al (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44:D279–D285

    Article  CAS  Google Scholar 

  31. Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins Struct Funct Genet 18:309–317

    Article  Google Scholar 

  32. Lammert H, Schug A, Onuchic JN (2009) Robustness and generalization of structure-based models for protein folding and function. Proteins 77:881–891

    Article  CAS  Google Scholar 

  33. Onuchic JN, Luthey-Schulten Z, Wolynes PG (1997) Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem 48:545–600

    Article  CAS  Google Scholar 

  34. Pirovano W, Heringa J (2010) Protein secondary structure prediction. Methods Mol Biol 609:327–348

    Article  CAS  Google Scholar 

  35. Yang Y, Gao J, Wang J et al (2018) Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 19:482–494. https://doi.org/10.1093/bib/bbw129

    Article  PubMed  Google Scholar 

  36. Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein secondary structure prediction server. Nucleic Acids Res 43:W389–W394

    Article  CAS  Google Scholar 

  37. Yachdav G, Kloppmann E, Kajan L et al (2014) PredictProtein—an open resource for online prediction of protein structural and functional features. Nucleic Acids Res 42:W337–W343

    Article  CAS  Google Scholar 

  38. Buchan DWA, Minneci F, Nugent TCO et al (2013) Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res 41:W349–W357

    Article  Google Scholar 

  39. Heffernan R, Paliwal K, Lyons J et al (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476

    Article  Google Scholar 

  40. Pronk S, Páll S, Schulz R et al (2013) GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29:845–854

    Article  CAS  Google Scholar 

  41. Kutzner C, Páll S, Fechner M et al (2015) Best bang for your buck: GPU nodes for GROMACS biomolecular simulations. J Comput Chem 36:1990–2008

    Article  CAS  Google Scholar 

  42. Meyer EE (1997) The first years of the Protein Data Bank. Protein Sci 6:1591–1597

    Article  CAS  Google Scholar 

  43. Young J, RCSB PDBj PDBe Protein Data Bank (2009) Annotation and curation of the Protein Data Bank. Nat Preced. https://doi.org/10.1038/npre.2009.3379.1

  44. Martínez L, Andreani R, Martínez JM (2007) Convergent algorithms for protein structural alignment. BMC Bioinformatics 8:306

    Article  Google Scholar 

  45. Li Y, Zhang Y (2009) REMO: a new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins 76:665–676

    Article  CAS  Google Scholar 

  46. Maupetit J, Gautier R, Tufféry P (2006) SABBAC: online Structural Alphabet-based protein BackBone reconstruction from Alpha-Carbon trace. Nucleic Acids Res 34:W147–W151

    Article  CAS  Google Scholar 

  47. Rotkiewicz P, Skolnick J (2008) Fast procedure for reconstruction of full-atom protein models from reduced representations. J Comput Chem 29:1460–1465

    Article  CAS  Google Scholar 

  48. Agre P (2006) The aquaporin water channels. Proc Am Thorac Soc 3:5–13

    Article  CAS  Google Scholar 

  49. Ishibashi K, Sasaki S (1997) Aquaporin water channels in mammals. Clin Exp Nephrol 1:247–253

    Article  Google Scholar 

  50. Agre P, Kozono D (2003) Aquaporin water channels: molecular mechanisms for human diseases1. FEBS Lett 555:72–78

    Article  CAS  Google Scholar 

  51. Marks DS, Colwell LJ, Sheridan R et al (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One 6:e28766

    Article  CAS  Google Scholar 

  52. Ash RB (2012) Information theory. Courier Corporation, Dover Publications Inc, Mineola, NY

    Google Scholar 

  53. Freedman D, Pisani R, Purves R (2007) Statistics: fourth international student edition. W. W. Norton & Company, New York, NY

    Google Scholar 

  54. Rapaport DC (2004) The art of molecular dynamics simulation. Cambridge University Press, New York, NY

    Book  Google Scholar 

  55. Karplus M, Kuriyan J (2005) Molecular dynamics and protein function. Proc Natl Acad Sci U S A 102:6679–6685

    Article  CAS  Google Scholar 

  56. Scheraga HA, Khalili M, Liwo A (2007) Protein-folding dynamics: overview of molecular simulation techniques. Annu Rev Phys Chem 58:57–83

    Article  CAS  Google Scholar 

  57. Ruiz Carrillo D, To Yiu Ying J, Darwis D et al (2014) Crystallization and preliminary crystallographic analysis of human aquaporin 1 at a resolution of 3.28 Å. Acta Crystallogr F Struct Biol Commun 70:1657–1663

    Article  CAS  Google Scholar 

  58. Subbiah S (1996) Protein motions. Springer, Berlin

    Google Scholar 

  59. Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins 57:702–710

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The authors thank financial support from the São Paulo Research Foundation (FAPESP) (Grants 2015/13667-9, 2010/16947-9, 2013/05475-7, and 2013/08293-7) and funding from the University of Texas at Dallas.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Faruck Morcos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

dos Santos, R.N., Jiang, X., Martínez, L., Morcos, F. (2019). Coevolutionary Signals and Structure-Based Models for the Prediction of Protein Native Conformations. In: Sikosek, T. (eds) Computational Methods in Protein Evolution. Methods in Molecular Biology, vol 1851. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-8736-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-8736-8_5

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-8735-1

  • Online ISBN: 978-1-4939-8736-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics