Prediction of Structures and Interactions from Genome Information

Miyazawa, Sanzo

doi:10.1007/978-981-13-2200-6_9

Sanzo Miyazawa¹⁰

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 1105))

1177 Accesses
2 Citations
1 Altmetric

Abstract

Predicting three dimensional residue-residue contacts from evolutionary information in protein sequences was attempted already in the early 1990s. However, contact prediction accuracies of methods evaluated in CASP experiments before CASP11 remained quite low, typically with <20% true positives. Recently, contact prediction has been significantly improved to the level that an accurate three dimensional model of a large protein can be generated on the basis of predicted contacts. This improvement was attained by disentangling direct from indirect correlations in amino acid covariations or cosubstitutions between sites in protein evolution. Here, we review statistical methods for extracting causative correlations and various approaches to describe protein structure, complex, and flexibility based on predicted contacts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adhikari B, Bhattacharya D, Cao R, Cheng J (2015) CONFOLD: residue-residue contact-guided ab initio protein folding. Proteins 83:1436–1449. https://doi.org/10.1002/prot.24829
Article CAS PubMed PubMed Central Google Scholar
Adhikari B, Nowotny J, Bhattacharya D, Hou J, Cheng J (2016) ConEVA: a toolbox for comprehensive assessment of protein contacts. BMC Bioinf 17:517. https://doi.org/10.1186/s12859-016-1404-z
Article CAS Google Scholar
Altschuh D, Vernet T, Berti P, Moras D, Nagai K (1988) Coordinated amino acid changes in homologous protein families. Protein Eng 2:193–199
Article CAS PubMed Google Scholar
Anishchenko I, Ovchinnikov S, Kamisetty H, Baker D (2013) Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci USA 114:9122–9127. https://doi.org/10.1073/pnas.1702664114
Article CAS Google Scholar
Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW (2000) Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol Biol Evol 17:164–178
Article CAS PubMed Google Scholar
Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ (2011) Learning generative models for protein fold families. Proteins 79:1061–1078. https://doi.org/10.1002/prot.22934
Article CAS PubMed Google Scholar
Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, Pagnani A (2014) Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS ONE 9(3):e92721. https://doi.org/10.1371/journal.pone.0092721
Article PubMed PubMed Central CAS Google Scholar
Barton JP, Leonardis ED, Coucke A, Cocco S (2016) ACE: adaptive cluster expansion for maximum entropy graphical model inference. Bioinformatics 32:3089–3097. https://doi.org/10.1093/bioinformatics/btw328
Article CAS PubMed Google Scholar
Braun W, Go N (1985) Calculation of protein conformations by proton-proton distance constraints: a new efficient algorithm. J Mol Biol 186:611–626. https://doi.org/10.1016/0022-2836(85)90134-2
Article CAS PubMed Google Scholar
Brünger AT (2007) Version 1.2 of the crystallography and NMR system. Nat Protoc 2:2728–2733. https://doi.org/10.1038/nprot.2007.406
Article PubMed CAS Google Scholar
Burger L, van Nimwegen E (2008) Acurate prediction of protein-protein interactions from sequence alignments using a Bayesian method. Mol Syst Biol 4:165
Article PubMed PubMed Central CAS Google Scholar
Burger L, van Nimwegen E (2010) Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput Biol 6(1):e1000633. https://doi.org/10.1371/journal.pcbi.1000633
Article PubMed PubMed Central CAS Google Scholar
CASP12 (2017) 12th community wide experiment on the critical assessment of techniques of protein structure prediction. http://predictioncenter.org/casp12/
Cocco S, Monasson R (2011) Adaptive cluster expansion for inferring Boltzmann machines with noisy data. Phys Rev Lett 106:090601. https://doi.org/10.1103/PhysRevLett.106.090601
Article CAS PubMed Google Scholar
Cocco S, Monasson R (2012) Adaptive cluster expansion for the inverse Ising problem: convergence, algorithm and tests. J Stat Phys 147:252–314. https://doi.org/10.1007/s10955-012-0463-4
Article Google Scholar
Cocco S, Feinauer C, Figliuzzi M, Monasson R, Weigt M (2017) Inverse statistical physics of protein sequences: a key issues review. arXiv:1703.01222 [q-bio.BM]
Google Scholar
Doron-Faigenboim A, Pupko T (2007) A combined empirical and mechanistic codon model. Mol Biol Evol 24:388–397
Article CAS PubMed Google Scholar
Dunn SD, Wahl LM, Gloor GB (2008) Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24:333–340
Article CAS PubMed Google Scholar
Dutheil J (2012) Detecting coevolving positions in a molecule: why and how to account for phylogeny. Brief Bioinf 13:228–243
Article Google Scholar
Dutheil J, Galtier N (2007) Detecting groups of coevolving positions in a molecule: a clustering approach. BMC Evol Biol 7:242
Article PubMed PubMed Central CAS Google Scholar
Dutheil J, Pupko T, Jean-Marie A, Galtier N (2005) A model-based approach for detecting coevolving positions in a molecule. Mol Biol Evol 22:1919–1928
Article CAS PubMed Google Scholar
Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E (2013) Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E 87:012707–1–16. https://doi.org/10.1103/PhysRevE.87.012707
Ekeberg M, Hartonen T, Aurell E (2014) Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. J Comput Phys 276:341–356
Article CAS Google Scholar
Fares M, Travers S (2006) A novel method for detecting intramolecular coevolution. Genetics 173:9–23
Article CAS PubMed PubMed Central Google Scholar
Fariselli P, Olmea O, Valencia A, Casadio R (2001) Prediction of contact maps with neural networks and correlated mutations. Protein Eng 14:835–843
Article CAS PubMed Google Scholar
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, Potter SC, Punta M, Qureshi M, Sangrador-Vegas A, Salazar GA, Tate J, Bateman A (2016) The Pfam protein families database: towards a more sustainable future. Nucl Acid Res 44:D279–D285. https://doi.org/10.1093/nar/gkv1344
Article CAS Google Scholar
Fitch WM, Markowitz E (1970) An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet 4:579–593
Article CAS PubMed Google Scholar
Fleishman SJ, Yifrach O, Ben-Tal N (2004) An evolutionarily conserved network of amino acids mediates gating in voltage-dependent potassium channels. J Mol Biol 340:307–318
Article CAS PubMed Google Scholar
Fodor AA, Aldrich RW (2004) Influence of conservation on calculations of amino acid covariance in multiple sequence alignment. Proteins 56:211–221
Article CAS PubMed Google Scholar
Giraud BG, Heumann JM, Lapedes AS (1999) Superadditive correlation. Phys Rev E 59:4973–4991
Article Google Scholar
Göbel U, Sander C, Schneider R, Valencia A (1994) Correlated mutations and residue contacts in proteins. Proteins 18:309–317
Article PubMed Google Scholar
Gulyás-Kovács A (2012) Integrated analysis of residue coevolution and protein structure in ABC transporters. PLoS ONE 7(5):e36546. https://doi.org/10.1371/journal.pone.0036546
Article PubMed PubMed Central CAS Google Scholar
Halabi N, Rivoire O, Leibler S, Ranganathan R (2009) Protein sectors: evolutionary units of three-dimensional structure. Cell 138:774–786
Article CAS PubMed PubMed Central Google Scholar
Havel TF, Kuntz ID, Crippen GM (1983) The combinatorial distance geometry method for the calculation of molecular conformation. I. A new approach to an old problem. J Theor Biol 104:359–381
CAS Google Scholar
Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS (2012) Three-dimensional structures of membrane proteins from genomic sequencing. Cell 149:1607–1621. https://doi.org/10.1016/j.cell.2012.04.012
Article CAS PubMed PubMed Central Google Scholar
Hopf TA, Schärfe CPI, Rodrigues JPGLM, Green AG, Kohlbacher O, Bonvin, AMJJ, Sander C, Marks DS (2014) Sequence co-evolution gives 3D contacts and structures of protein complexes. eLife 3:e03430. https://doi.org/10.7554/eLife.03430
Article PubMed Central CAS Google Scholar
Hopf TA, Ingraham JB, Poelwijk FJ, Schärfe CPI, Springer M, Sander C, Marks DS (2017) Mutation effects predicted from sequence co-variation. Nature Biotech 35:128–135. https://doi.org/10.1038/nbt.3769
Article CAS Google Scholar
Ingraham J, Marks D (2016) Variational inference for sparse and undirected models. arXiv:1602.03807 [stat.ML]
Google Scholar
Jacquin H, Gilson A, Shakhnovich E, Cocco S, Monasson R (2016) Benchmarking inverse statistical approaches for protein structure and design with exactly solvable models. PLoS Comput Biol 12:e1004889. https://doi.org/10.1371/journal.pcbi.1004889
Article PubMed PubMed Central CAS Google Scholar
Johnson LS, Eddy SR, Portugaly E (2010) Hidden Markov model speed heuristic and iterative HMM search procedure. BMC Bioinf 11:431
Article CAS Google Scholar
Jones DT (2001) Predicting novel protein folds by using FRAGFOLD. Proteins 45(S5):127–132
Article CAS Google Scholar
Jones DT, Bryson K, Coleman A, McGuffin LJ, Sadowski MI, Sodhi JS, Ward JJ (2005) Prediction of novel and analogous folds using fragment assembly and fold recognition. Proteins 61(S7):143–151. https://doi.org/10.1002/prot.20731
Article CAS PubMed Google Scholar
Jones DT, Buchan DWA, Cozzetto D, Pontil M (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28:184–190. https://doi.org/10.1093/bioinformatics/btr638
Article CAS PubMed Google Scholar
Jones DT, Singh T, Kosciolek T, Tetchner S (2015) MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 31:999–1006. https://doi.org/10.1093/bioinformatics/btu791
Article CAS PubMed Google Scholar
Kaján L, Hopf TA, Kalaš M, Marks DS, Rost B (2014) FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinf 15:85
Article CAS Google Scholar
Kamisetty H, Ovchinnikov S, Baker D (2013) Assessing the utility of coevolution-based residue- residue contact predictions in a sequence-and structure-rich era. Proc Natl Acad Sci USA 110:15674–15679. https://doi.org/10.1073/pnas.1314045110
Article CAS PubMed PubMed Central Google Scholar
Kim DE, Chivian D, Baker D (2004) Protein structure prediction and analysis using the Rosetta server. Nucl Acid Res 32:W526–W531
Article CAS Google Scholar
Kim DE, Blum B, Bradley P, Baker D (2009) Sampling bottlenecks in de novo protein structure prediction. J Mol Biol 393:249–260
Article PubMed PubMed Central CAS Google Scholar
Kosciolek T, Jones DT (2014) De novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS ONE 9:e92197. https://doi.org/10.1371/journal.pone.0092197
Article PubMed PubMed Central CAS Google Scholar
Kosciolek T, Jones DT (2016) Accurate contact predictions using covariation techniques and machine learning. Proteins 84(S1):145–151. https://doi.org/10.1002/prot.24863
Article PubMed CAS Google Scholar
Lapedes AS, Giraud BG, Liu LC, Stormo GD (1999) Correlated mutations in protein sequences: phylogenetic and structural effects. In: Seillier-Moiseiwitsch F (ed) IMS lecture notes: statistics in molecular biology and genetics: selected proceedings of the joint AMS-IMS-SIAM summer conference on statistics in molecular biology, 22–26 June 1997, pp 345–352. Institute of Mathematical Statistics
Google Scholar
Lapedes A, Giraud B, Jarzynsk C (2002) Using sequence alignments to predict protein structure and stability with high accuracy. LANL Sciece Magagine LA-UR-02-4481
Google Scholar
Lapedes A, Giraud B, Jarzynsk C (2012) Using sequence alignments to predict protein structure and stability with high accuracy. arXiv:1207.2484 [q-bio.QM]
Google Scholar
Maisnier-Patin S, Andersson DI (2004) Adaptation to the deleterious effect of antimicrobial drug resistance mutations by compensatory evolution. Res Microbiol 155:360–369
Article CAS PubMed Google Scholar
Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C (2011) Protein 3D structure computed from evolutionary sequence variation. PLoS ONE 6(12):e28766. https://doi.org/10.1371/journal.pone.0028766
Article CAS PubMed PubMed Central Google Scholar
Marks DS, Hopf TA, Sander C (2012) Protein structure prediction from sequence variation. Nat Biotech 30:1072–1080. https://doi.org/10.1038/nbt.2419
Article CAS Google Scholar
Martin LC, Gloor GB, Dunn SD, Wahl LM (2005) Using information theory to search for co-evolving residues in proteins. Bioinformatics 21:4116–4124
Article CAS PubMed Google Scholar
Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21:1087–1092
Article CAS Google Scholar
Miyazawa S (2013) Prediction of contact residue pairs based on co-substitution between sites in protein structures. PLoS ONE 8(1):e54252. https://doi.org/10.1371/journal.pone.0054252
Article CAS PubMed PubMed Central Google Scholar
Miyazawa S (2017a) Prediction of structures and interactions from genome information. arXiv:1709.08021 [q-bio.BM]
Google Scholar
Miyazawa S (2017b) Selection originating from protein stability/foldability: relationships between protein folding free energy, sequence ensemble, and fitness. J Theor Biol 433:21–38. https://doi.org/10.1016/j.jtbi.2017.08.018
Article CAS PubMed Google Scholar
Miyazawa S, Jernigan RL (1996) Residue-residue potentials with a favorable contact pair term and an unfavorable high packing density term for simulation and threading. J Mol Biol 256:623–644. https://doi.org/10.1006/jmbi.1996.0114
Article CAS PubMed Google Scholar
Morcos F, Pagnani A, Lunt B, Bertolino A, Marks DS, Sander C, Zecchina R, Onuchic JN, Hwa T, Weigt M (2011) Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA 108:E1293–E1301. https://doi.org/10.1073/pnas.1111471108
Article CAS PubMed PubMed Central Google Scholar
Morcos F, Schafer NP, Cheng RR, Onuchic JN, Wolynes PG (2014) Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci USA 111:12408–12413. https://doi.org/10.1073/pnas.1413575111
Article CAS PubMed PubMed Central Google Scholar
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A (2016) Critical assessment of methods of protein structure prediction: progress and new directions in round XI. Proteins 84(S1):4–14. https://doi.org/10.1002/prot.25064
Article PubMed PubMed Central CAS Google Scholar
Nugent T, Jones DT (2012) Accurate de novo structure prediction of large transmembrane protein domains using fragmentassembly and correlated mutation analysis. Proc Natl Acad Sci USA 109:E1540–E1547. https://doi.org/10.1073/pnas.1120036109
Article CAS PubMed PubMed Central Google Scholar
Ovchinnikov S, Kim DE, Wang RYR, Liu Y, DiMaio F, Baker D (2016) Improved de novo structure prediction in CASP11 by incorporating coevolution information into Rosetta. Proteins 84(S1):67–75. https://doi.org/10.1002/prot.24974
Article PubMed PubMed Central CAS Google Scholar
Pazos F, Helmer-Citterich M, Ausiello G, Valencia A (1997) Correlated mutations contain information about protein-protein interaction. J Mol Biol 271:511–523
Article CAS PubMed Google Scholar
Pollock DD, Taylor WR (1997) Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng 10:647–657
Article CAS PubMed Google Scholar
Pollock DD, Taylor WR, Goldman N (1999) Coevolving protein residues: maximum likelihood identification and relationship to structure. J Mol Biol 287:187–198
Article CAS PubMed Google Scholar
Poon AFY, Lewis FI, Frost SDW, Kosakovsky Pond SL (2008) Spidermonkey: rapid detection of co-evolving sites using Bayesian graphical models. Bioinformatics 24:1949–1950
Article CAS PubMed PubMed Central Google Scholar
Remmert M, Biegert A, Hauser A, Söding J (2012) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9:173–175
Article CAS Google Scholar
Riedmiller M, Braun H (1993) A direct adaptive method for faster backpropagation learning: the RPROP algorithm. IEEE Int Conf Neural Netw 1993:586–591
Article Google Scholar
Russ WP, Lowery DM, Mishra P, Yaffe MB, Ranganathan R (2005) Natural-like function in artificial WW domains. Nature 437:579–583
Article CAS PubMed Google Scholar
Seemayer S, Gruber M, Söding J (2014) CCMpred-fast and precise prediction of protein residue- residue contacts from correlated mutations. Bioinformatics 30:3128–3130. https://doi.org/10.1093/bioinformatics/btu500
Article CAS PubMed PubMed Central Google Scholar
Sfriso P, Duran-Frigola M, Mosca R, Emperador A, Aloy P, Orozco M (2016) Residues coevolution guides the systematic identification of altemative functional conformations in proteins. Structure 24:116–126. https://doi.org/10.1016/j.str.2015.10.025
Article CAS PubMed Google Scholar
Shendure J, Ji H (2017) EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction. BMC Bioinf 18:303. https://doi.org/10.1186/s12859-017-1713-x
Article CAS Google Scholar
Shindyalov IN, Kolchanov NA, Sander C (1994) Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng 7:349–358
Article CAS PubMed Google Scholar
Skerker JM, Perchuk BS, Siryapom A, Lubin EA, Ashenberg O, Goulian M, Laub MT (2008) Rewiring the specificity of two-component signal transduction systems. Cell 133:1043–1054
Article CAS PubMed PubMed Central Google Scholar
Skwark MJ, Abdel-Rehim A, Elofsson A (2013) PconsC: combination of direct information methods and alignments improves contact prediction. Bioinformatics 29:1815–1816
Article CAS PubMed Google Scholar
Skwark MJ, Raimondi D, Michel M, Elofsson A (2014) Improved contact predictions using the recognition of protein like contact patterns. PLoS Comput Biol 10:e1003889. https://doi.org/10.1371/journal.pcbi.1003889
Article PubMed PubMed Central CAS Google Scholar
Skwark MJ, Michel M, Hurtado DM, Ekeberg M, Elofsson A (2016) Accurate contact predictions for thousands of protein families using PconsC3. bioRXiv. https://doi.org/10.1101/079673
Sufkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN (2012) Genomics-aided structure prediction. Proc Natl Acad Sci USA 109:10340–10345. https://doi.org/10.1073/pnas.1207864109
Article Google Scholar
Sutto L, Marsili S, Valencia A, Gervasio FL (2015) From residue coevolution to protein conformational ensembles and functional dynamics. Proc Natl Acad Sci USA 112:13567–13572. https://doi.org/10.1073/pnas.1508584112
Article CAS PubMed PubMed Central Google Scholar
Talavera D, Lovell SC, Whelan S (2015) Covariation is a poor measure of molecular coevolution. Mol Biol Evol 32:2456-2468. https://doi.org/10.1093/molbev/msv109
Article CAS PubMed PubMed Central Google Scholar
Taylor WR, Sadowski MI (2011) Structural constraints on the covariance matrix derived from multiple aligned protein sequences. PLoS ONE 6(12):e28265. https://doi.org/10.1371/journal.pone.0028265
Article CAS PubMed PubMed Central Google Scholar
Tokuriki N, Tawfik DS (2009) Protein dynamism and evolvability. Science 324:203–207
Article CAS PubMed Google Scholar
Toth-Petroczy A, Palmedo P, Ingraham J, Hopf TA, Berger B, Sander C, Marks DS (2016) Structured states of disordered proteins from genomic sequences. Cell 167:158–170. https://doi.org/10.1016/j.cell.2016.09.010
Article CAS PubMed PubMed Central Google Scholar
Tufféry P, Darlu P (2000) Exploring a phylogenetic approach for the detection of correlated substitutions in proteins. Mol Biol Evol 17:1753–1759
Article Google Scholar
Wang S, Sun S, Li Z, Zhang R, Xu J (2017) Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol 13:e1004324. https://doi.org/10.1371/journal.pcbi.1005324
Google Scholar
Weigt M, White RA, Szurmant H, Hoch JA, Hwa T (2009) Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci USA 106:67–72. https://doi.org/10.1073/pnas.0805923106
Article CAS PubMed Google Scholar
Weinreb C, Riesselman AJ, Ingraham JB, Gross T, Sander C, Marks DS (2016) 3D RNA and functional interactions from evolutionary couplings. Cell 165:1–13. https://doi.org/10.1016/j.cell.2016.03.030
Article CAS Google Scholar
Wuyun Q, Zheng W, Peng Z, Yang J (2016) A large-scale comparative assessment of methods for residue-residue contact prediction. Brief Bioinform 19:219–230. https://doi.org/10.1093/bib/bbw106
Google Scholar
Yanovsky C, Hom V, Thorpe D Protein structure relationships revealed by mutation analysis. Science 146:1593–1594 (1964)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Gunma University, Kiryu, Japan
Sanzo Miyazawa

Authors

Sanzo Miyazawa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Protein Research, Osaka University, Suita, Osaka, Japan
Haruki Nakamura
European Bioinformatics Institute, Cambridgeshire, UK
Gerard Kleywegt
Rutgers, The State University of New Jersey, Piscataway, NJ, USA
Stephen K. Burley
Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA
John L. Markley

Appendix

An appendix described in full will be found in the article (Miyazawa 2017a) submitted to the arXiv.

1.1 Inverse Potts Model

1.1.1 A Gauge Employed for h _i(a _k) and J _ij(a _k, a _l)

Unless specified, a following gauge is employed; we call it q-gauge, here.

$$\displaystyle \begin{aligned} h_{i}(a_{q})=J_{ij}(a_{k},a_{q})=J_{ij}(a_{q},a_{l})=0 {} \end{aligned} $$

(9.16)

In this gauge, the amino acid a _q is the reference state for fields and couplings, and P _i(a _q), P _ij(a _k, a _q) = P _ji(a _q, a _k), and P _ij(a _q, a _q) are regarded as dependent variables. Common choices for the reference state a _q are the most common (consensus) state at each site. Any gauge can be transformed to another by the following transformation.

$$\displaystyle \begin{aligned} J_{ij}^{\mathrm{I}}(a_{k},a_{l})&\equiv J_{ij}(a_{k},a_{l})-J_{ij}(\cdot,a_{l})-J_{ij}(a_{k}, \cdot)+J_{ij}(\cdot, \cdot) {} \end{aligned} $$

(9.17)

$$\displaystyle \begin{aligned} h_{i}^{\mathrm{I}}(a_{k})&\equiv h_{i}(a_{k})-h_{i}(\cdot)+\sum_{j\neq i}(J_{ij}(a_{k}, \cdot)-J_{ij}(\cdot, \cdot)) {} \end{aligned} $$

(9.18)

where “⋅” denotes the reference state, which may be a _q for each site (q-gauge) or the average over all states (Ising gauge).

1.1.2 Boltzmann Machine

Fields h _i(a _k) and couplings J _ij(a _k, a _l) are estimated by iterating the following 2-step procedures.

1.
For a given set of h _i and J _ij(a _k, a _l), marginal probabilities, P ^MC(σ _i = a _k) and P ^MC(σ _i = a _k, σ _i = a _l), are estimated by a Markov chain Monte Carlo method (the Metropolis-Hastings algorithm (Metropolis et al. 1953)) or by any other method (for example, the message passing algorithm (Weigt et al. 2009)).
2.
Then, h _i and J _ij(a _k, a _l) are updated according to the gradient of negative log-posterior-probability per instance, ∂S ₀∕∂h _i(a _k) or ∂S ₀∕∂J _ij(a _k, a _l), multiplied by a parameter-specific weight factor (Barton et al. 2016), w _i(a _k) or w _ij(a _k, a _l); see Eqs. 9.8 and 9.12.
$$\displaystyle \begin{aligned} \varDelta h_{i}(a_{k})&=-(P^{\mathrm{MC}}(\sigma_{i}=a_{k})+\frac{\partial R}{\partial h_{i}(a_{k})}-P_{i}(a_{k}))\cdot w_{i}(a_{k}) {} \end{aligned} $$
(9.19)

$$\displaystyle \begin{aligned} \varDelta J_{ij}(a_{k},a_{l})&=-(P^{\mathrm{MC}}(\sigma_{i}=a_{k},\sigma_{i}=a_{l})+\frac{\partial R}{\partial J_{ij}(a_{k},a_{l})} \end{aligned} $$

$$\displaystyle \begin{aligned} &\quad -P_{ij}(a_{k},a_{l}))\cdot w_{ij}(a_{k},a_{l}) {} \end{aligned} $$
(9.20)

where weights are also updated as w _i(a _k) ← f(w _i(a _k)) and w _ij(a _k, a _l) ← f(w _ij(a _k, a _l)) according to the RPROP (Riedmiller and Braun 1993) algorithm; the function f(w) is defined as
$$\displaystyle \begin{aligned} f(w)\equiv\left\{\begin{array}{ll} \max(w\cdot s_{-},w_{\min}) & \mathrm{if}\ \mathrm{the}\ \mathrm{gradient}\ \mathrm{changes}\ \mathrm{its}\ \mathrm{sign},\\ \min(w\cdot s_{+},w_{\max}) & \mathrm{otherwise} \end{array}\right. {} \end{aligned} $$
(9.21)
$w_{\min }=10^{-3}$, $w_{\max }=10$, s ₋ = 0.5, and s ₊ = 1.9 < 1∕s ₋ were employed (Barton et al. 2016). After updated, h _i(a _k) and J _ij(a _k, a _l) may be modified to satisfy a given gauge.

The Boltzmann machine has a merit that model correlations are calculated.

1.1.3 Gaussian Approximation for P(σ) with a Normal-Inverse-Wishart Prior

The normal-inverse-Wishart distribution (NIW) is the product of the multivariate normal distribution $(\mathcal {N})$ and the inverse-Wishart distribution $(\mathcal {W}^{-1})$, which are the conjugate priors for the mean vector and for the covariance matrix of a multivariate Gaussian distribution, respectively. The NIW is employed as a prior in GaussDCA (Baldassi et al. 2014), in which the sequence distribution P(σ) is approximated as a Gaussian distribution. In this approximation, the q-gauge is used, and P _i(a _q), P _ij(a _k, a _q) = P _ji(a _q, a _k), and P _ij(a _q, a _q) are regarded as dependent variables; see section “A Gauge Employed for h _i(a _k) and J _ij(a _k, a _l)”; in GaussDCA, deletion is excluded from independent variables.

The posterior distribution for the NIW is also a NIW. Thus, the cross entropy S ₀ can be represented as

(9.22)

$$\displaystyle \begin{aligned} &=\frac{-1}{B}\log[\mathcal{N}(\mu|\mu^{B},\varSigma/\kappa^{B})\mathcal{W}^{-1}(\varSigma|\varLambda^{B},\nu^{B}){} \end{aligned} $$

(9.23)

$$\displaystyle \begin{aligned} &\quad (\det(2\pi\varSigma))^{-B/2}(\frac{\kappa}{\kappa^{B}})^{\dim\varSigma/2}\frac{(\det(\varLambda/2))^{\nu/2}}{(\det(\varLambda^{B}/2))^{\nu^{B}/2}}\frac{\varGamma_{\dim\varSigma}(\nu^{B}/2)}{\varGamma_{\dim\varSigma}(\nu/2)}(\det\varSigma)^{-(\nu - \nu^{B})2}] {} \end{aligned} $$

(9.24)

where $\varGamma _{\dim \varSigma }(\nu /2)$ is the multivariate Γ function, μ is the mean vector, and $\dim \varSigma $ is the dimension of covariance matrix Σ, $\dim \varSigma =(q-1)L$ excluding deletion in GaussDCA. The normal and NIW distributions are defined as follows.

$$\displaystyle \begin{aligned} \mathcal{N}(\mu|\mu^{0},\varSigma)&\equiv(\det(2\pi\varSigma))^{-1/2}\exp(-\frac{(\mu-\mu^{0})^{T}{{\varSigma^{-1}}}(\mu-\mu^{0})}{2}) {} \end{aligned} $$

(9.25)

$$\displaystyle \begin{aligned} \mathcal{W}^{-1}(\varSigma|\varLambda,\nu)&\equiv\frac{(\det(\varLambda/2))^{\nu/2}}{\varGamma_{\dim\varSigma}(\nu/2)}(\det\varSigma)^{-(\nu+\dim\varSigma+1)/2}\exp(-\frac{1}{2}\mathrm{Tr}\varLambda\varSigma^{-1}) {} \end{aligned} $$

(9.26)

Parameters μ ^B, κ ^B, ν ^B, and Λ ^B satisfy

$$\displaystyle \begin{aligned} \mu_{i}^{B}(a_{k})&=(\kappa\mu_{i}^{0}(a_{k})+BP_{i}(a_{k}))/(\kappa+B)\ ,\ \kappa^{B}=\kappa+B\ ,\ \nu^{B}=\nu+B {} \end{aligned} $$

(9.27)

$$\displaystyle \begin{aligned} \varLambda_{ij}^{B}(a_{k},a_{l})&=\varLambda_{ij}(a_{k},a_{l})+BC_{ij}(a_{k},a_{l}) \end{aligned} $$

$$\displaystyle \begin{aligned} &\quad +\frac{\kappa B}{\kappa+B}[(P_{i}(a_{k})-\mu_{i}^{0}(a_{k}))(P_{j}(a_{l})-\mu_{j}^{0}(a_{l}))] {} \end{aligned} $$

(9.28)

where the Λ and ν are the scale matrix and the degree of freedom, respectively, shaping the inverse-Wishart distribution, and C is the given covariance matrix; C _ij(a _k, a _l) ≡ P _ij(a _k, a _l) − P _i(a _k)P _i(a _l). The mean values of μ and Σ under NW posterior are μ ^B and $\varLambda ^{B}/(\nu ^{B}-\dim \varSigma -1)$, and their mode values are μ ^B and $\varLambda ^{B}/(\nu ^{B}+\dim \varSigma +1)$, which minimize the cross entropy or maximize the posterior probability. The covariance matrix Σ can be estimated to be the exactly same value by adjusting the value of ν, whichever the mean posterior or the maximum posterior is employed for the estimation of Σ. In GaussDCA, the mean posterior estimate was employed but here the maximum posterior estimate is employed according to the present formalism.

$$\displaystyle \begin{aligned} (\mu,\varSigma)=\arg\min\limits_{(\mu,\varSigma)}S_{0}(\mu,\varSigma|\{P_{i}\},\{P_{ij}\})=(\mu^{B},\varLambda^{B}/(\nu^{B}+\dim\varSigma+1)) {} \end{aligned} $$

(9.29)

According to GaussDCA, ν is chosen in such a way that σ _ij(a _k, a _l) is nearly equal to the covariance matrix corrected by pseudocount; $\nu =\kappa +\dim \varSigma +1$ for the mean posterior estimate in GaussDCA, but $\nu =\kappa -\dim \varSigma -1$ for the maximum posterior estimate here.

From Eq. 9.15, the estimates of couplings and fields are calculated.

$$\displaystyle \begin{aligned} J_{ij}^{\mathrm{NIW}}(a_{k},a_{l})=-\frac{\partial S_{0}(\{P_{i}\},\{P_{ij}\})}{\partial P_{ij}(a_{k},a_{l})}=-\frac{(\kappa+B+1)}{\kappa+B}(\varSigma^{-1})_{ij}(a_{k},a_{l}) {} \end{aligned} $$

(9.30)

Because the number of instances is far greater than 1 (B ≫ 1), these estimates of couplings are practically equal to the estimates (J ^MF = −Σ ⁻¹) in the mean field approximation, which was employed in GaussDCA (Baldassi et al. 2014).

$$\displaystyle \begin{aligned} h_{i}^{\mathrm{NIW}}(a_{k})&=-\sum_{j\neq i}\sum_{l}J_{ij}^{\mathrm{NIW}}(a_{k},a_{l})P_{j}(a_{l})-\frac{(\kappa+B+1)}{\kappa+B}\sum_{j}\sum_{l\neq q}(\varSigma^{-1})_{ij}(a_{k},a_{l}) \\ &\quad [\delta_{ij}\frac{\delta_{kl}-2P_{i}(a_{l})}{2}+\frac{\kappa B}{\kappa+B}(P_{j}(a_{l})-\mu_{j}^{0}(a_{l}))] {} \end{aligned} $$

(9.31)

The $(h_{i}^{\mathrm {NIW}}(a_{k})-h_{i}^{\mathrm {NIW}}(a_{q}))$ does not converge to $\log P_{i}(a_{k})/P_{i}(a_{q})$ as J ^NIW → 0 but $h_{i}^{\mathrm {MF}}(a_{k})- h_{i}^{\mathrm {MF}}(a_{q})$ does; in other words, the mean field approximation gives a better h for the limiting case of no couplings than the present approximation. Barton et al. (2016) reported that the Gaussian approximation generally gave a better generative model than the mean field approximation.

In GaussDCA (Baldassi et al. 2014), μ ⁰ and Λ∕κ were chosen to be as uninformative as possible, i.e., mean and covariance for a uniform distribution.

$$\displaystyle \begin{aligned} \mu_{i}^{0}(a_{k})=1/q,\quad \frac{\varLambda_{ij}(a_{k},a_{l})}{\kappa}=\frac{\delta_{ij}}{q}(\delta_{kl}-\frac{1}{q}) {} \end{aligned} $$

(9.32)

1.1.4 Pseudo-likelihood Approximation

Symmetric Pseudo-likelihood Maximization

The probability of an instance σ ^τ is approximated as follows by the product of conditional probabilities of observing $\sigma _{i}^{\tau }$ under the given observations $\sigma _{j\neq i}^{\tau }$ of all other sites.

$$\displaystyle \begin{aligned} P(\sigma^{\tau})\approx\prod_{i}P(\sigma_{i}=\sigma_{i}^{\tau}|\{\sigma_{j\neq i}=\sigma_{j}^{\tau}\}) {} \end{aligned} $$

(9.33)

Then, cross entropy is approximated as

$$\displaystyle \begin{aligned} S_{0}(h,J|\{P_{i}\}, \{P_{ij}\})&\approx S_{0}^{\mathrm{PLM}}(h,J|\{P_{i}\},\{P_{ij}\})\equiv\sum_{i}S_{0,i}(h, J|\{P_{i}\}, \{P_{ij}\}) {} \end{aligned} $$

(9.34)

$$\displaystyle \begin{aligned} S_{0,i}(h, J|\{P_{i}\}, \{P_{ij}\})&\equiv\frac{-1}{B}\sum_{\tau}\ell_{i}(\sigma_{i}=\sigma_{i}^{\tau}|\{\sigma_{j\neq i}=\sigma_{j}^{\tau}\},h, J)+R_{i}(h,J) {} \end{aligned} $$

(9.35)

where conditional log-likelihoods and ℓ ₂ norm regularization terms employed in Ekeberg et al. (2013) are

$$\displaystyle \begin{aligned} \ell_{i}(\sigma_{i}=\sigma_{i}^{\tau}|\{\sigma_{j\neq i}=\sigma_{j}^{\tau}\},h, J)&=\log[\frac{\exp(h_{i}(\sigma_{i}^{\tau})+\sum_{j\neq i}J_{ij}(\sigma_{i}^{\tau},\sigma_{j}^{\tau}))}{\sum_{k}\exp(h_{i}(a_{k})+\sum_{j\neq i}J_{ij}(a_{k},\sigma_{j}^{\tau}))}]{}\end{aligned} $$

(9.36)

$$\displaystyle \begin{aligned} R_{i}(h, J)&\equiv\gamma_{h}\sum_{k}h_{i}(a_{k})^{2}+\frac{\gamma_{J}}{2}\sum_{k}\sum_{j\neq i}\sum_{l}J_{ij}(a_{k},a_{l})^{2} {} \end{aligned} $$

(9.37)

The optimum fields and couplings in this approximation are estimated by minimizing the pseudo-cross-entropy, $S_{0}^{\mathrm {PLM}}$.

$$\displaystyle \begin{aligned} (h^{\mathrm{PLM}},J^{\mathrm{PLM}})=\arg\min\limits_{h,J}S_{0}^{\mathrm{PLM}}(h,J|\{P_{i}\}, \{P_{ij}\}) {} \end{aligned} $$

(9.38)

Equation 9.38 is not invariant under gauge transformation; the ℓ ₂ norm regularization terms in Eq. 9.38 favors only a specific gauge that corresponds to γ _J∑_lJ _ij(a _k, a _l) = γ _hh _i(a _k), γ _J∑_kJ _ij(a _k, a _l) = γ _hh _j(a _l), and ∑_kh _i(a _k) = 0 for all i, j(> i), k and l (Ekeberg et al. 2013). γ _J = γ _h = 0.01 that is relatively a large value independent of B was employed in Ekeberg et al. (2013). γ _h = 0.01 but γ _J = q(L − 1)γ _h were employed in Hopf et al. (2017), in which gapped sites in each sequence were excluded in the calculation of the Hamiltonian H(σ), and therefore q = 20.

GREMLIN (Kamisetty et al. 2013) employs Gaussian prior probabilities that depend on site pairs.

$$\displaystyle \begin{aligned} R_{i}(h, J)&\equiv\gamma_{h}\sum_{k}h_{i}(a_{k})^{2}+\sum_{k}\sum_{j\neq i}\frac{\gamma_{ij}}{2}\sum_{l}J_{ij}(a_{k},a_{l})^{2} {} \end{aligned} $$

(9.39)

$$\displaystyle \begin{aligned} \gamma_{ij}&\equiv\gamma_{c}(1-\gamma_{p}\log(P_{ij}^{0})) {} \end{aligned} $$

(9.40)

where $P_{ij}^{0}$ is the prior probability of site pair (i, j) being in contact.

Asymmetric Pseudo-likelihood Maximization

To speed up the minimization of S ₀, a further approximation, in which S _0,i is separately minimized, is employed (Ekeberg et al. 2014), and fields and couplings are estimated as follows.

$$\displaystyle \begin{aligned} J_{ij}^{\mathrm{PLM}}(a_{k},a_{l})&\simeq\frac{1}{2}(J_{ij}^{*}(a_{k},a_{l})+J_{ji}^{*}(a_{l},a_{k})) {} \end{aligned} $$

(9.41)

$$\displaystyle \begin{aligned} (h_{i}^{\mathrm{PLM}},J_{i}^{*})&=\arg\min\limits_{h_{i},J_{i}}S_{0,i}(h,J|\{P_{i}\}, \{P_{ij}\}) {} \end{aligned} $$

(9.42)

It is appropriate to transform h and J estimated above into a some specific gauge such as the Ising gauge.

1.1.5 ACE (Adaptive Cluster Expansion) of Cross-Entropy for Sparse Markov Random Field

The cross entropy S ₀({h _i, J _ij}|{P _i}, {P _ij}, i, j ∈ Γ) of a cluster of sites Γ, which is defined as the negative log-likelihood per instance in Eq. 9.14, is approximately minimized by taking account of sets L _k(t) of only significant clusters consisting of k sites, the incremental entropy (cluster cross entropy) ΔS _Γ of which is significant (|ΔS _Γ| > t) (Cocco and Monasson 2011, 2012; Barton et al. 2016).

$$\displaystyle \begin{aligned} S_{0}(\{P_{i},P_{ij}|i,j\in\varGamma\})&\simeq\sum_{l=1}^{|\varGamma|},\sum_{\varGamma'\in L_l(t),\varGamma'\subset\varGamma} \varDelta S_{0}(\{P_{i},P_{ij}|i,j\in\varGamma'\}) {} \end{aligned} $$

(9.43)

$$\displaystyle \begin{aligned} \varDelta S_{0}(\{P_{i},P_{ij}|i,j\in\varGamma\})&\equiv S_{0}(\{P_{i},P_{ij}|i,j\in\varGamma\})-\sum_{\varGamma'\subset\varGamma}\varDelta S_{0}(\{P_{i},P_{ij}|i,j\in\varGamma'\}) {} \end{aligned} $$

(9.44)

$$\displaystyle \begin{aligned} &=\sum_{\varGamma'\subseteq\varGamma}(-1)^{|\varGamma|-|\varGamma'|} {\ S_{0}(\{P_{i},P_{ij}|i,j\in\varGamma'\})} {} \end{aligned} $$

(9.45)

L _k+1(t) is constructed from L _k(t) by adding a cluster Γ consisting of (k + 1) sites in a lax case provided that any pair of size k clusters Γ ¹, Γ ² ∈ L _k(t) and Γ ¹ ∪ Γ ² = Γ or in a strict case if Γ′∈ L _k(t) for ∀Γ′ such that Γ′⊂ Γ and |Γ′| = k. Thus, Eq. 9.43 yields sparse solutions. The cross entropies S ₀({P _i, P _ij|i, j ∈ Γ′}) for the small size of clusters are estimated by minimizing S ₀({h _i, J _ij}|{P _i, P _ij}, i, j ∈ Γ′) with respect to fields and couplings. Starting from a large value of the threshold t (typically t = 1), the cross-entropy S ₀({P _i, P _ij}|i, j ∈{1, …, N}) is calculated by gradually decreasing t until its value converges. Convergence of the algorithm may also be more difficult for alignments of long proteins or those with very strong interactions. In such cases, strong regularization may be employed.

The following regularization terms of ℓ ₂ norm are employed in ACE (Barton et al. 2016), and so Eq. 9.43 is not invariant under gauge transformation.

$$\displaystyle \begin{aligned} -\frac{1}{B}\log P_{0}(h,J|i,j\in\varGamma)=\gamma_{h}\sum_{i\in\varGamma}\sum_{k}h_{i}(a_{k})^{2}+\gamma_{J}\sum_{i\in\varGamma}\sum_{k}\sum_{J>i,j\in\varGamma}\sum_{l}J_{ij}(a_{k},a_{l})^{2} {} \end{aligned} $$

(9.46)

γ _h = γ _J ∝ 1∕B was employed (Barton et al. 2016).

The compression of the number of Potts states, q _i ≤ q, at each site can be taken into account. All infrequently observed states or states that insignificantly contribute to site entropy can be treated as the same state, and a complete model can be recovered (Barton et al. 2016) by setting $h_{i}(a_{k})= h_{i}(a_{k^{\prime }})+\log (P_{i}(a_{k})/P_{i}^{\prime }(a_{k^{\prime }}))$, and $J_{ij}(a_{k},a_{l})=J_{ij}^{\prime }(a_{k^{\prime }},a_{l^{\prime }})$, where “′” denotes a corresponding aggregated state and a potential.

Starting from the output set of the fields h _i(a _k) and couplings J _ij(a _k, a _l) obtained from the cluster expansion of the cross-entropy, a Boltzmann machine is trained with P _i(a _k) and P _ij(a _k) by the RPROP algorithm (Riedmiller and Braun 1993) to refine the parameter values of h _i and J _ij(a _k, a _l) (Barton et al. 2016); see section “Boltzmann Machine”. This post-processing is also useful because model correlations are calculated.

An appropriate value of the regularization parameter for trypsin inhibitor were much larger (γ = 1) for contact prediction than those (γ = 2∕B = 10⁻³) for recovering true fields and couplings (Barton et al. 2016), probably because the task of contact prediction requires the relative ranking of interactions rather than their actual values.

1.1.6 Scoring Methods for Contact Prediction

Corrected Frobenius Norm (L ₂₂ Matrix Norm), $\mathcal {S}_{ij}^{\mathrm {CFN}}$

For scoring, plmDCA (Ekeberg et al. 2013, 2014) employs the corrected Frobenius norm of $J_{ij}^{\mathrm {I}}$ transformed in the Ising gauge, in which $J_{ij}^{\mathrm {I}}$ does not contain anything that could have been explained by fields h _i and h _j; $J_{ij}^{\mathrm {I}}(a_{k},a_{l})\equiv J_{ij}(a_{k},a_{l})-J_{ij}(\cdot ,a_{l})-J_{ij}(a_{k}, \cdot )+J_{ij}(\cdot , \cdot )$ where $J_{ij}( \cdot ,a_{l})=J_{ji}(a_{l}, \cdot )\equiv \sum _{k=1}^{q}J_{ij}(a_{k},a_{l})/q$.

$$\displaystyle \begin{aligned} \mathcal{S}_{ij}^{\mathrm{CFN}}\equiv \mathcal{S}_{ij}^{\mathrm{FN}}-\mathcal{S}_{\cdot j}^{\mathrm{FN}}\mathcal{S}_{i\cdot}^{\mathrm{FN}}/\mathcal{S}_{\cdot\cdot}^{\mathrm{FN}},\quad \mathcal{S}_{ij}^{\mathrm{FN}}\equiv\sqrt{\sum_{\kappa\neq \mathrm{gap}}\sum_{l\neq \mathrm{gap}}J_{ij}^{\mathrm{I}}(a_{k},a_{l})^{2}} {} \end{aligned} $$

(9.47)

where “⋅” denotes average over the indicated variable. This CFN score with the gap state excluded in Eq. 9.47 performs better (Ekeberg et al. 2014; Baldassi et al. 2014) than both scores of FN and DI/EC (Weigt et al. 2009; Morcos et al. 2011; Marks et al. 2011; Hopf et al. 2012).

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Miyazawa, S. (2018). Prediction of Structures and Interactions from Genome Information. In: Nakamura, H., Kleywegt, G., Burley, S., Markley, J. (eds) Integrative Structural Biology with Hybrid Methods. Advances in Experimental Medicine and Biology, vol 1105. Springer, Singapore. https://doi.org/10.1007/978-981-13-2200-6_9

Download citation

DOI: https://doi.org/10.1007/978-981-13-2200-6_9
Published: 09 January 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2199-3
Online ISBN: 978-981-13-2200-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Prediction of Structures and Interactions from Genome Information

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Appendix

1.1 Inverse Potts Model

1.1.1 A Gauge Employed for h _i(a _k) and J _ij(a _k, a _l)

1.1.2 Boltzmann Machine

1.1.3 Gaussian Approximation for P(σ) with a Normal-Inverse-Wishart Prior

1.1.4 Pseudo-likelihood Approximation

Symmetric Pseudo-likelihood Maximization

Asymmetric Pseudo-likelihood Maximization

1.1.5 ACE (Adaptive Cluster Expansion) of Cross-Entropy for Sparse Markov Random Field

1.1.6 Scoring Methods for Contact Prediction

Corrected Frobenius Norm (L ₂₂ Matrix Norm), \(\mathcal {S}_{ij}^{\mathrm {CFN}}\)

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Prediction of Structures and Interactions from Genome Information

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Inverse Potts Model

1.1.1 A Gauge Employed for h i(a k) and J ij(a k, a l)

1.1.2 Boltzmann Machine

1.1.3 Gaussian Approximation for P(σ) with a Normal-Inverse-Wishart Prior

1.1.4 Pseudo-likelihood Approximation

Symmetric Pseudo-likelihood Maximization

Asymmetric Pseudo-likelihood Maximization

1.1.5 ACE (Adaptive Cluster Expansion) of Cross-Entropy for Sparse Markov Random Field

1.1.6 Scoring Methods for Contact Prediction

Corrected Frobenius Norm (L 22 Matrix Norm), \(\mathcal {S}_{ij}^{\mathrm {CFN}}\)

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation

1.1.1 A Gauge Employed for h _i(a _k) and J _ij(a _k, a _l)

Corrected Frobenius Norm (L ₂₂ Matrix Norm), \(\mathcal {S}_{ij}^{\mathrm {CFN}}\)