Prediction of Structures and Interactions from Genome Information

  • Sanzo Miyazawa
Part of the Advances in Experimental Medicine and Biology book series (AEMB, volume 1105)


Predicting three dimensional residue-residue contacts from evolutionary information in protein sequences was attempted already in the early 1990s. However, contact prediction accuracies of methods evaluated in CASP experiments before CASP11 remained quite low, typically with <20% true positives. Recently, contact prediction has been significantly improved to the level that an accurate three dimensional model of a large protein can be generated on the basis of predicted contacts. This improvement was attained by disentangling direct from indirect correlations in amino acid covariations or cosubstitutions between sites in protein evolution. Here, we review statistical methods for extracting causative correlations and various approaches to describe protein structure, complex, and flexibility based on predicted contacts.


Contact prediction Direct coupling Amino acid covariation Amino acid cosubstitution Partial correlation Maximum entropy model Inverse Potts model Markov random field Boltzmann machine Deep neural network 


  1. Adhikari B, Bhattacharya D, Cao R, Cheng J (2015) CONFOLD: residue-residue contact-guided ab initio protein folding. Proteins 83:1436–1449. PubMedPubMedCentralCrossRefGoogle Scholar
  2. Adhikari B, Nowotny J, Bhattacharya D, Hou J, Cheng J (2016) ConEVA: a toolbox for comprehensive assessment of protein contacts. BMC Bioinf 17:517. CrossRefGoogle Scholar
  3. Altschuh D, Vernet T, Berti P, Moras D, Nagai K (1988) Coordinated amino acid changes in homologous protein families. Protein Eng 2:193–199PubMedCrossRefGoogle Scholar
  4. Anishchenko I, Ovchinnikov S, Kamisetty H, Baker D (2013) Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci USA 114:9122–9127. CrossRefGoogle Scholar
  5. Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW (2000) Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol Biol Evol 17:164–178PubMedCrossRefGoogle Scholar
  6. Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ (2011) Learning generative models for protein fold families. Proteins 79:1061–1078. PubMedCrossRefGoogle Scholar
  7. Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, Pagnani A (2014) Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS ONE 9(3):e92721. PubMedPubMedCentralCrossRefGoogle Scholar
  8. Barton JP, Leonardis ED, Coucke A, Cocco S (2016) ACE: adaptive cluster expansion for maximum entropy graphical model inference. Bioinformatics 32:3089–3097. PubMedCrossRefGoogle Scholar
  9. Braun W, Go N (1985) Calculation of protein conformations by proton-proton distance constraints: a new efficient algorithm. J Mol Biol 186:611–626. PubMedCrossRefGoogle Scholar
  10. Brünger AT (2007) Version 1.2 of the crystallography and NMR system. Nat Protoc 2:2728–2733. PubMedCrossRefGoogle Scholar
  11. Burger L, van Nimwegen E (2008) Acurate prediction of protein-protein interactions from sequence alignments using a Bayesian method. Mol Syst Biol 4:165PubMedPubMedCentralCrossRefGoogle Scholar
  12. Burger L, van Nimwegen E (2010) Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput Biol 6(1):e1000633. PubMedPubMedCentralCrossRefGoogle Scholar
  13. CASP12 (2017) 12th community wide experiment on the critical assessment of techniques of protein structure prediction.
  14. Cocco S, Monasson R (2011) Adaptive cluster expansion for inferring Boltzmann machines with noisy data. Phys Rev Lett 106:090601. PubMedCrossRefGoogle Scholar
  15. Cocco S, Monasson R (2012) Adaptive cluster expansion for the inverse Ising problem: convergence, algorithm and tests. J Stat Phys 147:252–314. CrossRefGoogle Scholar
  16. Cocco S, Feinauer C, Figliuzzi M, Monasson R, Weigt M (2017) Inverse statistical physics of protein sequences: a key issues review. arXiv:1703.01222 [q-bio.BM]Google Scholar
  17. Doron-Faigenboim A, Pupko T (2007) A combined empirical and mechanistic codon model. Mol Biol Evol 24:388–397PubMedCrossRefGoogle Scholar
  18. Dunn SD, Wahl LM, Gloor GB (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24:333–340PubMedCrossRefGoogle Scholar
  19. Dutheil J (2012) Detecting coevolving positions in a molecule: why and how to account for phylogeny. Brief Bioinf 13:228–243CrossRefGoogle Scholar
  20. Dutheil J, Galtier N (2007) Detecting groups of coevolving positions in a molecule: a clustering approach. BMC Evol Biol 7:242PubMedPubMedCentralCrossRefGoogle Scholar
  21. Dutheil J, Pupko T, Jean-Marie A, Galtier N (2005) A model-based approach for detecting coevolving positions in a molecule. Mol Biol Evol 22:1919–1928PubMedCrossRefGoogle Scholar
  22. Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E 87:012707–1–16.
  23. Ekeberg M, Hartonen T, Aurell E (2014) Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. J Comput Phys 276:341–356CrossRefGoogle Scholar
  24. Fares M, Travers S (2006) A novel method for detecting intramolecular coevolution. Genetics 173:9–23PubMedPubMedCentralCrossRefGoogle Scholar
  25. Fariselli P, Olmea O, Valencia A, Casadio R (2001) Prediction of contact maps with neural networks and correlated mutations. Protein Eng 14:835–843PubMedCrossRefGoogle Scholar
  26. Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A (2016) The Pfam protein families database: towards a more sustainable future. Nucl Acid Res 44:D279–D285. CrossRefGoogle Scholar
  27. Fitch WM, Markowitz E (1970) An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet 4:579–593PubMedCrossRefGoogle Scholar
  28. Fleishman SJ, Yifrach O, Ben-Tal N (2004) An evolutionarily conserved network of amino acids mediates gating in voltage-dependent potassium channels. J Mol Biol 340:307–318PubMedCrossRefGoogle Scholar
  29. Fodor AA, Aldrich RW (2004) Influence of conservation on calculations of amino acid covariance in multiple sequence alignment. Proteins 56:211–221PubMedCrossRefGoogle Scholar
  30. Giraud BG, Heumann JM, Lapedes AS (1999) Superadditive correlation. Phys Rev E 59:4973–4991CrossRefGoogle Scholar
  31. Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins 18:309–317PubMedCrossRefGoogle Scholar
  32. Gulyás-Kovács A (2012) Integrated analysis of residue coevolution and protein structure in ABC transporters. PLoS ONE 7(5):e36546. PubMedPubMedCentralCrossRefGoogle Scholar
  33. Halabi N, Rivoire O, Leibler S, Ranganathan R (2009) Protein sectors: evolutionary units of three-dimensional structure. Cell 138:774–786PubMedPubMedCentralCrossRefGoogle Scholar
  34. Havel TF, Kuntz ID, Crippen GM (1983) The combinatorial distance geometry method for the calculation of molecular conformation. I. A new approach to an old problem. J Theor Biol 104:359–381Google Scholar
  35. Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149:1607–1621. PubMedPubMedCentralCrossRefGoogle Scholar
  36. Hopf TA, Schärfe CPI, Rodrigues JPGLM, Green AG, Kohlbacher O, Bonvin, AMJJ, Sander C, Marks DS (2014) Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3:e03430. PubMedCentralCrossRefPubMedGoogle Scholar
  37. Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CPI, Springer M, Sander C, Marks DS (2017) Mutation effects predicted from sequence co-variation. Nature Biotech 35:128–135. CrossRefGoogle Scholar
  38. Ingraham J, Marks D (2016) Variational inference for sparse and undirected models. arXiv:1602.03807 [stat.ML]Google Scholar
  39. Jacquin H, Gilson A, Shakhnovich E, Cocco S, Monasson R (2016) Benchmarking inverse statistical approaches for protein structure and design with exactly solvable models. PLoS Comput Biol 12:e1004889. PubMedPubMedCentralCrossRefGoogle Scholar
  40. Johnson LS, Eddy SR, Portugaly E (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinf 11:431CrossRefGoogle Scholar
  41. Jones DT (2001) Predicting novel protein folds by using FRAGFOLD. Proteins 45(S5):127–132CrossRefGoogle Scholar
  42. Jones DT, Bryson K, Coleman A, McGuffin LJ, Sadowski MI, Sodhi JS, Ward JJ (2005) Prediction of novel and analogous folds using fragment assembly and fold recognition. Proteins 61(S7):143–151. PubMedCrossRefGoogle Scholar
  43. Jones DT, Buchan DWA, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28:184–190. PubMedCrossRefGoogle Scholar
  44. Jones DT, Singh T, Kosciolek T, Tetchner S (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31:999–1006. PubMedCrossRefGoogle Scholar
  45. Kaján L, Hopf TA, Kalaš M, Marks DS, Rost B (2014) FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinf 15:85CrossRefGoogle Scholar
  46. Kamisetty H, Ovchinnikov S, Baker D (2013) Assessing the utility of coevolution-based residue- residue contact predictions in a sequence-and structure-rich era. Proc Natl Acad Sci USA 110:15674–15679. PubMedCrossRefGoogle Scholar
  47. Kim DE, Chivian D, Baker D (2004) Protein structure prediction and analysis using the Rosetta server. Nucl Acid Res 32:W526–W531CrossRefGoogle Scholar
  48. Kim DE, Blum B, Bradley P, Baker D (2009) Sampling bottlenecks in de novo protein structure prediction. J Mol Biol 393:249–260PubMedPubMedCentralCrossRefGoogle Scholar
  49. Kosciolek T, Jones DT (2014) De novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS ONE 9:e92197. PubMedPubMedCentralCrossRefGoogle Scholar
  50. Kosciolek T, Jones DT (2016) Accurate contact predictions using covariation techniques and machine learning. Proteins 84(S1):145–151. PubMedCrossRefGoogle Scholar
  51. Lapedes AS, Giraud BG, Liu LC, Stormo GD (1999) Correlated mutations in protein sequences: phylogenetic and structural effects. In: Seillier-Moiseiwitsch F (ed) IMS lecture notes: statistics in molecular biology and genetics: selected proceedings of the joint AMS-IMS-SIAM summer conference on statistics in molecular biology, 22–26 June 1997, pp 345–352. Institute of Mathematical StatisticsGoogle Scholar
  52. Lapedes A, Giraud B, Jarzynsk C (2002) Using sequence alignments to predict protein structure and stability with high accuracy. LANL Sciece Magagine LA-UR-02-4481Google Scholar
  53. Lapedes A, Giraud B, Jarzynsk C (2012) Using sequence alignments to predict protein structure and stability with high accuracy. arXiv:1207.2484 [q-bio.QM]Google Scholar
  54. Maisnier-Patin S, Andersson DI (2004) Adaptation to the deleterious effect of antimicrobial drug resistance mutations by compensatory evolution. Res Microbiol 155:360–369PubMedCrossRefGoogle Scholar
  55. Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6(12):e28766. PubMedPubMedCentralCrossRefGoogle Scholar
  56. Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotech 30:1072–1080. CrossRefGoogle Scholar
  57. Martin LC, Gloor GB, Dunn SD, Wahl LM (2005) Using information theory to search for co-evolving residues in proteins. Bioinformatics 21:4116–4124PubMedCrossRefGoogle Scholar
  58. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092CrossRefGoogle Scholar
  59. Miyazawa S (2013) Prediction of contact residue pairs based on co-substitution between sites in protein structures. PLoS ONE 8(1):e54252. PubMedPubMedCentralCrossRefGoogle Scholar
  60. Miyazawa S (2017a) Prediction of structures and interactions from genome information. arXiv:1709.08021 [q-bio.BM]Google Scholar
  61. Miyazawa S (2017b) Selection originating from protein stability/foldability: relationships between protein folding free energy, sequence ensemble, and fitness. J Theor Biol 433:21–38. PubMedCrossRefGoogle Scholar
  62. Miyazawa S, Jernigan RL (1996) Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term for simulation and threading. J Mol Biol 256:623–644. PubMedCrossRefGoogle Scholar
  63. Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA 108:E1293–E1301. PubMedCrossRefGoogle Scholar
  64. Morcos F, Schafer NP, Cheng RR, Onuchic JN, Wolynes PG (2014) Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci USA 111:12408–12413. PubMedCrossRefGoogle Scholar
  65. Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A (2016) Critical assessment of methods of protein structure prediction: progress and new directions in round XI. Proteins 84(S1):4–14. PubMedPubMedCentralCrossRefGoogle Scholar
  66. Nugent T, Jones DT (2012) Accurate de novo structure prediction of large transmembrane protein domains using fragmentassembly and correlated mutation analysis. Proc Natl Acad Sci USA 109:E1540–E1547. PubMedCrossRefGoogle Scholar
  67. Ovchinnikov S, Kim DE, Wang RYR, Liu Y, DiMaio F, Baker D (2016) Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins 84(S1):67–75. PubMedPubMedCentralCrossRefGoogle Scholar
  68. Pazos F, Helmer-Citterich M, Ausiello G, Valencia A (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271:511–523PubMedCrossRefGoogle Scholar
  69. Pollock DD, Taylor WR (1997) Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng 10:647–657PubMedCrossRefGoogle Scholar
  70. Pollock DD, Taylor WR, Goldman N (1999) Coevolving protein residues: maximum likelihood identification and relationship to structure. J Mol Biol 287:187–198PubMedCrossRefGoogle Scholar
  71. Poon AFY, Lewis FI, Frost SDW, Kosakovsky Pond SL (2008) Spidermonkey: rapid detection of co-evolving sites using Bayesian graphical models. Bioinformatics 24:1949–1950PubMedPubMedCentralCrossRefGoogle Scholar
  72. Remmert M, Biegert A, Hauser A, Söding J (2012) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9:173–175CrossRefGoogle Scholar
  73. Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. IEEE Int Conf Neural Netw 1993:586–591CrossRefGoogle Scholar
  74. Russ WP, Lowery DM, Mishra P, Yaffe MB, Ranganathan R (2005) Natural-like function in artificial WW domains. Nature 437:579–583PubMedCrossRefGoogle Scholar
  75. Seemayer S, Gruber M, Söding J (2014) CCMpred-fast and precise prediction of protein residue- residue contacts from correlated mutations. Bioinformatics 30:3128–3130. PubMedPubMedCentralCrossRefGoogle Scholar
  76. Sfriso P, Duran-Frigola M, Mosca R, Emperador A, Aloy P, Orozco M (2016) Residues coevolution guides the systematic identification of altemative functional conformations in proteins. Structure 24:116–126. PubMedCrossRefGoogle Scholar
  77. Shendure J, Ji H (2017) EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction. BMC Bioinf 18:303. CrossRefGoogle Scholar
  78. Shindyalov IN, Kolchanov NA, Sander C (1994) Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 7:349–358PubMedCrossRefGoogle Scholar
  79. Skerker JM, Perchuk BS, Siryapom A, Lubin EA, Ashenberg O, Goulian M, Laub MT (2008) Rewiring the specificity of two-component signal transduction systems. Cell 133:1043–1054PubMedPubMedCentralCrossRefGoogle Scholar
  80. Skwark MJ, Abdel-Rehim A, Elofsson A (2013) PconsC: combination of direct information methods and alignments improves contact prediction. Bioinformatics 29:1815–1816PubMedCrossRefGoogle Scholar
  81. Skwark MJ, Raimondi D, Michel M, Elofsson A (2014) Improved contact predictions using the recognition of protein like contact patterns. PLoS Comput Biol 10:e1003889. PubMedPubMedCentralCrossRefGoogle Scholar
  82. Skwark MJ, Michel M, Hurtado DM, Ekeberg M, Elofsson A (2016) Accurate contact predictions for thousands of protein families using PconsC3. bioRXiv.
  83. Sufkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN (2012) Genomics-aided structure prediction. Proc Natl Acad Sci USA 109:10340–10345. CrossRefGoogle Scholar
  84. Sutto L, Marsili S, Valencia A, Gervasio FL (2015) From residue coevolution to protein conformational ensembles and functional dynamics. Proc Natl Acad Sci USA 112:13567–13572. PubMedCrossRefGoogle Scholar
  85. Talavera D, Lovell SC, Whelan S (2015) Covariation is a poor measure of molecular coevolution. Mol Biol Evol 32:2456-2468. PubMedPubMedCentralCrossRefGoogle Scholar
  86. Taylor WR, Sadowski MI (2011) Structural constraints on the covariance matrix derived from multiple aligned protein sequences. PLoS ONE 6(12):e28265. PubMedPubMedCentralCrossRefGoogle Scholar
  87. Tokuriki N, Tawfik DS (2009) Protein dynamism and evolvability. Science 324:203–207PubMedCrossRefGoogle Scholar
  88. Toth-Petroczy A, Palmedo P, Ingraham J, Hopf TA, Berger B, Sander C, Marks DS (2016) Structured states of disordered proteins from genomic sequences. Cell 167:158–170. PubMedPubMedCentralCrossRefGoogle Scholar
  89. Tufféry P, Darlu P (2000) Exploring a phylogenetic approach for the detection of correlated substitutions in proteins. Mol Biol Evol 17:1753–1759CrossRefGoogle Scholar
  90. Wang S, Sun S, Li Z, Zhang R, Xu J (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol 13:e1004324. Google Scholar
  91. Weigt M, White RA, Szurmant H, Hoch JA, Hwa T (2009) Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci USA 106:67–72. PubMedCrossRefGoogle Scholar
  92. Weinreb C, Riesselman AJ, Ingraham JB, Gross T, Sander C, Marks DS (2016) 3D RNA and functional interactions from evolutionary couplings. Cell 165:1–13. CrossRefGoogle Scholar
  93. Wuyun Q, Zheng W, Peng Z, Yang J (2016) A large-scale comparative assessment of methods for residue-residue contact prediction. Brief Bioinform 19:219–230. Google Scholar
  94. Yanovsky C, Hom V, Thorpe D Protein structure relationships revealed by mutation analysis. Science 146:1593–1594 (1964)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Sanzo Miyazawa
    • 1
  1. 1.Gunma UniversityKiryuJapan

Personalised recommendations