Coevolutionary Signals and Structure-Based Models for the Prediction of Protein Native Conformations

  • Ricardo Nascimento dos Santos
  • Xianli Jiang
  • Leandro Martínez
  • Faruck MorcosEmail author
Part of the Methods in Molecular Biology book series (MIMB, volume 1851)


The analysis of coevolutionary signals from families of evolutionarily related sequences is a recent conceptual framework that provides valuable information about unique intramolecular interactions and, therefore, can assist in the elucidation of biomolecular conformations. It is based on the idea that compensatory mutations at specific residue positions in a sequence help preserve stability of protein architecture and function and leave a statistical signature related to residue-residue interactions in the 3D structure of the protein. Consequently, statistical analysis of these correlated mutations in subsets of protein sequence alignments can be used to predict which residue pairs should be in spatial proximity in the native functional protein fold. These predicted signals can be then used to guide molecular dynamics (MD) simulations to predict the three-dimensional coordinates of a functional amino acid chain. In this chapter, we introduce a general and efficient methodology to perform coevolutionary analysis on protein sequences and to use this information in combination with computational physical models to predict the native 3D conformation of functional polypeptides. We present a step-by-step methodology that includes the description and application of software tools and databases required to infer tertiary structures of a protein fold. The general pipeline includes instructions on (1) how to obtain direct amino acid couplings from protein sequences using direct coupling analysis (DCA), (2) how to incorporate such signals as interaction potentials in Cα structure-based models (SBMs) to drive protein-folding MD simulations, (3) a procedure to estimate secondary structure and how to include such estimates in the topology files required in the MD simulations, and (4) how to build full atomic models based on the top Cα candidates selected in the pipeline. The information presented in this chapter is self-contained and sufficient to allow a computational scientist to predict structures of proteins using publicly available algorithms and databases.

Key words

Coevolution Structure-based model Energy landscapes Molecular dynamics Protein Folding Structure prediction 



The authors thank financial support from the São Paulo Research Foundation (FAPESP) (Grants 2015/13667-9, 2010/16947-9, 2013/05475-7, and 2013/08293-7) and funding from the University of Texas at Dallas.


  1. 1.
    Morcos F, Pagnani A, Lunt B et al (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci U S A 108:E1293–E1301CrossRefGoogle Scholar
  2. 2.
    Hamilton N, Burrage K, Ragan MA, Huber T (2004) Protein contact prediction using patterns of correlation. Proteins 56:679–684CrossRefGoogle Scholar
  3. 3.
    Ivankov DN, Finkelstein AV, Kondrashov FA (2014) A structural perspective of compensatory evolution. Curr Opin Struct Biol 26:104–112CrossRefGoogle Scholar
  4. 4.
    de Juan D, Pazos F, Valencia A (2013) Emerging methods in protein co-evolution. Nat Rev Genet 14:249–261CrossRefGoogle Scholar
  5. 5.
    Morcos F, Hwa T, Onuchic JN, Weigt M (2014) Direct coupling analysis for protein contact prediction. Methods Mol Biol 1137:55–70CrossRefGoogle Scholar
  6. 6.
    Sulkowska JI, Morcos F, Weigt M et al (2012) Genomics-aided structure prediction. Proc Natl Acad Sci 109:10340–10345CrossRefGoogle Scholar
  7. 7.
    Hopf TA, Colwell LJ, Sheridan R et al (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149:1607–1621CrossRefGoogle Scholar
  8. 8.
    Ovchinnikov S, Kamisetty H, Baker D (2014) Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. Elife 3:e02030CrossRefGoogle Scholar
  9. 9.
    Kamisetty H, Ovchinnikov S, Baker D (2013) Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci U S A 110:15674–15679CrossRefGoogle Scholar
  10. 10.
    Skwark MJ, Abdel-Rehim A, Elofsson A (2013) PconsC: combination of direct information methods and alignments improves contact prediction. Bioinformatics 29:1815–1816CrossRefGoogle Scholar
  11. 11.
    Ekeberg M, Lövkvist C, Lan Y et al (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlinear Soft Matter Phys 87:012707CrossRefGoogle Scholar
  12. 12.
    Hayat S, Sander C, Marks DS, Elofsson A (2015) All-atom 3D structure prediction of transmembrane β-barrel proteins from sequences. Proc Natl Acad Sci U S A 112:5413–5418CrossRefGoogle Scholar
  13. 13.
    Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30:1072–1080CrossRefGoogle Scholar
  14. 14.
    Jones DT, Singh T, Kosciolek T, Tetchner S (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31:999–1006CrossRefGoogle Scholar
  15. 15.
    Sadowski MI, Taylor WR (2013) Prediction of protein contacts from correlated sequence substitutions. Sci Prog 96:33–42CrossRefGoogle Scholar
  16. 16.
    Hopf TA, Morinaga S, Ihara S et al (2015) Amino acid coevolution reveals three-dimensional structure and functional domains of insect odorant receptors. Nat Commun 6:6077CrossRefGoogle Scholar
  17. 17.
    Schug A, Weigt M, Onuchic JN et al (2009) High-resolution protein complexes from integrating genomic information with molecular simulation. Proc Natl Acad Sci U S A 106:22124–22129CrossRefGoogle Scholar
  18. 18.
    Tamir S, Rotem-Bamberger S, Katz C et al (2014) Integrated strategy reveals the protein interface between cancer targets Bcl-2 and NAF-1. Proc Natl Acad Sci U S A 111:5177–5182CrossRefGoogle Scholar
  19. 19.
    dos Santos RN, Morcos F, Jana B et al (2015) Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 5:13652CrossRefGoogle Scholar
  20. 20.
    Morcos F, Schafer NP, Cheng RR et al (2014) Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci U S A 111:12408–12413CrossRefGoogle Scholar
  21. 21.
    Mallik S, Kundu S (2015) Co-evolutionary constraints of globular proteins correlate with their folding rates. FEBS Lett 589:2179–2185CrossRefGoogle Scholar
  22. 22.
    Morcos F, Jana B, Hwa T, Onuchic JN (2013) Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A 110:20533–20538CrossRefGoogle Scholar
  23. 23.
    Sfriso P, Duran-Frigola M, Mosca R et al (2016) Residues coevolution guides the systematic identification of alternative functional conformations in proteins. Structure 24:116–126CrossRefGoogle Scholar
  24. 24.
    Cheng RR, Morcos F, Levine H, Onuchic JN (2014) Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci U S A 111:E563–E571CrossRefGoogle Scholar
  25. 25.
    Jana B, Morcos F, Onuchic JN (2014) From structure to function: the convergence of structure based models and co-evolutionary information. Phys Chem Chem Phys 16:6496–6507CrossRefGoogle Scholar
  26. 26.
    Noel JK, Levi M, Raghunathan M et al (2016) SMOG 2: a versatile software package for generating structure-based models. PLoS Comput Biol 12:e1004794CrossRefGoogle Scholar
  27. 27.
    Noel JK, Whitford PC, Sanbonmatsu KY, Onuchic JN (2010) SMOG@ctbp: simplified deployment of structure-based models in GROMACS. Nucleic Acids Res 38:W657–W661CrossRefGoogle Scholar
  28. 28.
    UniProt Consortium (2015) UniProt: a hub for protein information. Nucleic Acids Res 43:D204–D212CrossRefGoogle Scholar
  29. 29.
    Bateman A (2000) The Pfam protein families database. Nucleic Acids Res 28:263–266CrossRefGoogle Scholar
  30. 30.
    Finn RD, Coggill P, Eberhardt RY et al (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44:D279–D285CrossRefGoogle Scholar
  31. 31.
    Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins Struct Funct Genet 18:309–317CrossRefGoogle Scholar
  32. 32.
    Lammert H, Schug A, Onuchic JN (2009) Robustness and generalization of structure-based models for protein folding and function. Proteins 77:881–891CrossRefGoogle Scholar
  33. 33.
    Onuchic JN, Luthey-Schulten Z, Wolynes PG (1997) Theory of protein folding: the energy landscape perspective. Annu Rev Phys Chem 48:545–600CrossRefGoogle Scholar
  34. 34.
    Pirovano W, Heringa J (2010) Protein secondary structure prediction. Methods Mol Biol 609:327–348CrossRefGoogle Scholar
  35. 35.
    Yang Y, Gao J, Wang J et al (2018) Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 19:482–494. Scholar
  36. 36.
    Drozdetskiy A, Cole C, Procter J, Barton GJ (2015) JPred4: a protein secondary structure prediction server. Nucleic Acids Res 43:W389–W394CrossRefGoogle Scholar
  37. 37.
    Yachdav G, Kloppmann E, Kajan L et al (2014) PredictProtein—an open resource for online prediction of protein structural and functional features. Nucleic Acids Res 42:W337–W343CrossRefGoogle Scholar
  38. 38.
    Buchan DWA, Minneci F, Nugent TCO et al (2013) Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res 41:W349–W357CrossRefGoogle Scholar
  39. 39.
    Heffernan R, Paliwal K, Lyons J et al (2015) Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep 5:11476CrossRefGoogle Scholar
  40. 40.
    Pronk S, Páll S, Schulz R et al (2013) GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29:845–854CrossRefGoogle Scholar
  41. 41.
    Kutzner C, Páll S, Fechner M et al (2015) Best bang for your buck: GPU nodes for GROMACS biomolecular simulations. J Comput Chem 36:1990–2008CrossRefGoogle Scholar
  42. 42.
    Meyer EE (1997) The first years of the Protein Data Bank. Protein Sci 6:1591–1597CrossRefGoogle Scholar
  43. 43.
    Young J, RCSB PDBj PDBe Protein Data Bank (2009) Annotation and curation of the Protein Data Bank. Nat Preced.
  44. 44.
    Martínez L, Andreani R, Martínez JM (2007) Convergent algorithms for protein structural alignment. BMC Bioinformatics 8:306CrossRefGoogle Scholar
  45. 45.
    Li Y, Zhang Y (2009) REMO: a new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins 76:665–676CrossRefGoogle Scholar
  46. 46.
    Maupetit J, Gautier R, Tufféry P (2006) SABBAC: online Structural Alphabet-based protein BackBone reconstruction from Alpha-Carbon trace. Nucleic Acids Res 34:W147–W151CrossRefGoogle Scholar
  47. 47.
    Rotkiewicz P, Skolnick J (2008) Fast procedure for reconstruction of full-atom protein models from reduced representations. J Comput Chem 29:1460–1465CrossRefGoogle Scholar
  48. 48.
    Agre P (2006) The aquaporin water channels. Proc Am Thorac Soc 3:5–13CrossRefGoogle Scholar
  49. 49.
    Ishibashi K, Sasaki S (1997) Aquaporin water channels in mammals. Clin Exp Nephrol 1:247–253CrossRefGoogle Scholar
  50. 50.
    Agre P, Kozono D (2003) Aquaporin water channels: molecular mechanisms for human diseases1. FEBS Lett 555:72–78CrossRefGoogle Scholar
  51. 51.
    Marks DS, Colwell LJ, Sheridan R et al (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One 6:e28766CrossRefGoogle Scholar
  52. 52.
    Ash RB (2012) Information theory. Courier Corporation, Dover Publications Inc, Mineola, NYGoogle Scholar
  53. 53.
    Freedman D, Pisani R, Purves R (2007) Statistics: fourth international student edition. W. W. Norton & Company, New York, NYGoogle Scholar
  54. 54.
    Rapaport DC (2004) The art of molecular dynamics simulation. Cambridge University Press, New York, NYCrossRefGoogle Scholar
  55. 55.
    Karplus M, Kuriyan J (2005) Molecular dynamics and protein function. Proc Natl Acad Sci U S A 102:6679–6685CrossRefGoogle Scholar
  56. 56.
    Scheraga HA, Khalili M, Liwo A (2007) Protein-folding dynamics: overview of molecular simulation techniques. Annu Rev Phys Chem 58:57–83CrossRefGoogle Scholar
  57. 57.
    Ruiz Carrillo D, To Yiu Ying J, Darwis D et al (2014) Crystallization and preliminary crystallographic analysis of human aquaporin 1 at a resolution of 3.28 Å. Acta Crystallogr F Struct Biol Commun 70:1657–1663CrossRefGoogle Scholar
  58. 58.
    Subbiah S (1996) Protein motions. Springer, BerlinGoogle Scholar
  59. 59.
    Zhang Y, Skolnick J (2004) Scoring function for automated assessment of protein structure template quality. Proteins 57:702–710CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Ricardo Nascimento dos Santos
    • 1
  • Xianli Jiang
    • 2
  • Leandro Martínez
    • 1
  • Faruck Morcos
    • 2
    • 3
    Email author
  1. 1.Institute of ChemistryUniversity of Campinas (UNICAMP)CampinasBrazil
  2. 2.Department of Biological SciencesUniversity of Texas at DallasRichardsonUSA
  3. 3.Center for Systems BiologyUniversity of Texas at DallasRichardsonUSA

Personalised recommendations