Direct Coupling Analysis for Protein Contact Prediction

  • Faruck Morcos
  • Terence Hwa
  • José N. Onuchic
  • Martin Weigt
Part of the Methods in Molecular Biology book series (MIMB, volume 1137)


During evolution, structure, and function of proteins are remarkably conserved, whereas amino-acid sequences vary strongly between homologous proteins. Structural conservation constrains sequence variability and forces different residues to coevolve, i.e., to show correlated patterns of amino-acid occurrences. However, residue correlation may result from direct coupling, e.g., by a contact in the folded protein, or be induced indirectly via intermediate residues. To use empirically observed correlations for predicting residue–residue contacts, direct and indirect effects have to be disentangled. Here we present mechanistic details on how to achieve this using a methodology called Direct Coupling Analysis (DCA). DCA has been shown to produce highly accurate estimates of amino-acid pairs that have direct reciprocal constraints in evolution. Specifically, we provide instructions and protocols on how to use the algorithmic implementations of DCA starting from data extraction to predicted-contact visualization in contact maps or representative protein structures.


Direct coupling analysis Maximum entropy Contact prediction Residue–residue interactions Coevolution Direct correlations Statistical inference 



This work was supported by the Center for Theoretical Biological Physics sponsored by the NSF (Grant PHY-0822283) and by NSF-MCB-1214457. JNO is a CPRIT Scholar in Cancer Research sponsored by the Cancer Prevention and Research Institute of Texas.


  1. 1.
    Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins Struct Funct Genet 18:309–317PubMedCrossRefGoogle Scholar
  2. 2.
    Lockless SW, Ranganathan R (1999) Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286:295–299PubMedCrossRefGoogle Scholar
  3. 3.
    Fariselli P, Casadio R (1999) A neural network based predictor of residue contacts in proteins. Protein Eng 12(1):15–21PubMedCrossRefGoogle Scholar
  4. 4.
    Fariselli P, Olmea O, Valencia A, Casadio R (2001) Prediction of contact maps with neural networks and correlated mutations. Protein Eng 14(11):835–843PubMedCrossRefGoogle Scholar
  5. 5.
    Pollastri G, Baldi P (2002) Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics 18 Suppl 1:S62–S70Google Scholar
  6. 6.
    Hamilton N, Burrage K, Ragan MA, Huber T (2004) Protein contact prediction using patterns of correlation. Proteins Struct Funct Bioinformatics 56(4):679–684CrossRefGoogle Scholar
  7. 7.
    Morcos F et al (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA 108(49):E1293–E1301PubMedCentralPubMedCrossRefGoogle Scholar
  8. 8.
    Lunt B et al (2010) Inference of direct residue contacts in two-component signaling. Methods Enzymol 471:17–41PubMedCrossRefGoogle Scholar
  9. 9.
    Weigt M, White RA, Szurmant H, Hoch JA, Hwa T (2009) Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci USA 106:67–72PubMedCentralPubMedCrossRefGoogle Scholar
  10. 10.
    Burger L, van Nimwegen E (2010) Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput Biol 6:e1000633PubMedCentralPubMedCrossRefGoogle Scholar
  11. 11.
    Taylor WR, Sadowski MI (2011) Structural constraints on the covariance matrix derived from multiple aligned protein sequences. PLoS One 6(12):e28265PubMedCentralPubMedCrossRefGoogle Scholar
  12. 12.
    Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ (2011) Learning generative models for protein fold families. Proteins 79(4):1061–1078PubMedCrossRefGoogle Scholar
  13. 13.
    Jones DT, Buchan DW, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28(2):184–190PubMedCrossRefGoogle Scholar
  14. 14.
    Dago AE et al (2012) Structural basis of histidine kinase autophosphorylation deduced by integrating genomics, molecular dynamics, and mutagenesis. Proc Natl Acad Sci USA 109(26):E1733–1742Google Scholar
  15. 15.
    Schug A, Weigt M, Onuchic JN, Hwa T, Szurmant H (2009) High-resolution protein complexes from integrating genomic information with molecular simulation. Proc Natl Acad Sci USA 106:22124–22129PubMedCentralPubMedCrossRefGoogle Scholar
  16. 16.
    Schug A, Weigt M, Hoch J, Onuchic J (2010) Computational modeling of phosphotransfer complexes in two-component signaling. Methods Enzymol 471:43–58PubMedCrossRefGoogle Scholar
  17. 17.
    Sulkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN (2012) Genomics-aided structure prediction. Proc Natl Acad Sci USA 109(26):10340–10345Google Scholar
  18. 18.
    Marks DS et al (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS One 6:e28766PubMedCentralPubMedCrossRefGoogle Scholar
  19. 19.
    Hopf TA et al (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149(7):1607–1621PubMedCentralPubMedCrossRefGoogle Scholar
  20. 20.
    Nugent T, Jones DT (2012) Accurate de novo structure prediction of large transmembrane protein domains using fragment-assembly and correlated mutation analysis. Proc Natl Acad Sci USA 109(24):E1540–E1547PubMedCentralPubMedCrossRefGoogle Scholar
  21. 21.
    Finn RD et al (2010) The Pfam protein families database. Nucleic Acids Res 38:D211–D222PubMedCentralPubMedCrossRefGoogle Scholar
  22. 22.
    Berman HM et al (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242PubMedCentralPubMedCrossRefGoogle Scholar
  23. 23.
    Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7(10):e1002195PubMedCentralPubMedCrossRefGoogle Scholar
  24. 24.
    Pettersen EF et al (2004) UCSF Chimera—a visualization system for exploratory research and analysis. J Comput Chem 25:1605–1612PubMedCrossRefGoogle Scholar
  25. 25.
    Clementi C, Nymeyer H, Onuchic JN (2000) Topological and energetic factors: what determines the structural details of the transition state ensemble and “en-route” intermediates for protein folding? An investigation for small globular proteins. J Mol Biol 298:937–953PubMedCrossRefGoogle Scholar
  26. 26.
    Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotechnol 30(11):1072–1080PubMedCrossRefGoogle Scholar
  27. 27.
    Dill KA, MacCallum JL (2012) The protein-folding problem, 50 years on. Science 338(6110):1042–1046PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Faruck Morcos
    • 1
  • Terence Hwa
    • 2
  • José N. Onuchic
    • 1
  • Martin Weigt
    • 3
  1. 1.Center for Theoretical Biological PhysicsRice UniversityHoustonUSA
  2. 2.Center for Theoretical Biological PhysicsUniversity of California at San DiegoLa JollaUSA
  3. 3.UMR7238—Laboratoire de Génomique des MicroorganismesUniversité Pierre et Marie CurieParisFrance

Personalised recommendations