A Bayes-optimal sequence-structure theory that unifies protein sequence-structure recognition and alignment

Lathrop, Richard H.; Rogers, Robert G.; Smith, Temple F.; White, James V.

doi:10.1006/S0092-8240(98)90002-7

A Bayes-optimal sequence-structure theory that unifies protein sequence-structure recognition and alignment

Published: November 1998

Volume 60, pages 1039–1071, (1998)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Richard H. Lathrop¹,
Robert G. Rogers Jr²,
Temple F. Smith² &
…
James V. White³

73 Accesses
Explore all metrics

Abstract

A rigorous Bayesian analysis is presented that unifies protein sequence-structure alignment and recognition. Given a sequence, explicit formulae are derived to select (1) its globally most probable core structure from a structure library; (2) its globally most probable alignment to a given core structure; (3) its most probable joint core structure and alignment chosen globally across the entire library; and (4) its most probable individual segments, secondary structure, and super-secondary structures across the entire library. The computations involved are NP-hard in the general case (3D-3D). Fast exact recursions for the restricted sequence singleton-only (1D-3D) case are given. Conclusions include: (a) the most probable joint core structure and alignment is not necessarily the most probable alignment of the most probable core structure, but rather maximizes the product of core and alignment probabilities; (b) use of a sequence-independent linear or affine gap penalty may result in the highest-probability threading not having the lowest score; (c) selecting the most probable core structure from the library (core structure selection or fold recognition only) involves comparing probabilities summed over all possible alignments of the sequence to the core, and not comparing individual optimal (or near-optimal) sequence-structure alignments; and (d) assuming uninformative priors, core structure selection is equivalent to comparing the ratio of two global means.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection

Bayesian Multiple Protein Structure Alignment

ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

Article Open access 20 June 2016

References

Akutsu, T. and S. Miyano (1997). On the approximation of protein threading, in Proc. Int. Conf. on Computational Molecular Biology, S. Istrail, R. Karp, T. Lengauer, P. Pevzner, R. Shamir and M. Waterman (Eds), New York: ACM Press, pp. 3–8.
Google Scholar
Akutsu, T. and H. Tashimo (1998). Linear programming based approach to the derivation of a contact potential for protein threading, in Proc. Pacific Symp. on Biocomputing ’98, R. B. Altman, A. K. Dunker, L. Hunter and T. E. Klein (Eds), Singapore: World Scientific, pp. 413–424.
Google Scholar
Arnold, G. E., A. K. Dunker, S. J. Johns and R. J. Douthart (1992). Use of conditional probabilities for determining relationships between amino acid sequence and protein secondary structure. Proteins: Structure, Function, and Genetics 12, 382–399.
Article Google Scholar
Bayes, T. (1764). An essay toward solving a problem in the doctrine of chances. Phil. Trans. Royal Soc. London 53, 370–418. Reprinted (1970) in Studies in the History of Statistics and Probability, E. S. Pearson and M. G. Kendall (Eds), London: Charles Griffin, London, pp. 131–153.
Google Scholar
Benner, S. A., M. A. Cohen and G. H. Gonnet (1993). Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J. Mol. Biol. 229, 1065–1082.
Article Google Scholar
Bowie, J. and D. Eisenberg (1993). Inverted protein structure prediction. Current Opinion in Structural Biol. 3, 437–444.
Article Google Scholar
Bowie, F. U., R. Lüthy and D. Eisenberg (1991). A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164–170.
Google Scholar
Box, G. E. and G. C. Tiao (1973). Bayesian Inference in Statistical Analysis, Reading, MA: Addison-Wesley.
MATH Google Scholar
Brooks, C. L., M. Karplus and B. M. Pettitt (1990). Proteins: A Theoretical Perspective of Dynamics, Structure, and Thermodynamics, New York: John Wiley and Sons.
Google Scholar
Bryant, S. H. and S. F. Altschul (1995). Statistics of sequence-structure threading. Current Opinion in Structural Biol. 5, 236–244.
Article Google Scholar
Bryant, S. H. and C. E. Lawrence (1993). An empirical energy function for threading protein sequence through the folding motif. Proteins: Structure, Function, and Genetics 16, 92–112.
Article Google Scholar
Crippen, G. M. (1996). Failures of inverse folding and threading with gapped alignment. Proteins 26, 167–71.
Article Google Scholar
Desmet, J., M. De Maeyer, B. Hazes and I. Lasters (1992). The dead-end elimination theorem and its use in protein side-chain positioning. Nature (London) 356, 539–542.
Article Google Scholar
Dill, K. A., S. Bromberg, K. Yue, K. M. Fiebig, D. P. Yee, P. D. Thomas and H. S. Chan (1995). Principles of protein folding—a perspective from simple exact models. Protein Science 4, 561–602.
Article Google Scholar
Dunbrack Jr, R. L. and F. E. Cohen (1997). Bayesian statistical analysis of protein sidechain rotamer preferences, Protein Science 6, 1661–1681.
Google Scholar
Fetrow, J. S. and S. H. Bryant (1993). New programs for protein tertiary structure prediction. Bio/Technology 11, 479–484.
Article Google Scholar
Finkelstein, A.V., A. Y. Badretdinov and A. M. Gutin (1995). Why do proteins have Boltzmann-like statistics? Proteins: Structure, Function, and Genetics 23, 142–150.
Article Google Scholar
Finkelstein, A. V. and B. Reva (1991). A search for the most stable folds of protein chains. Nature (London) 351, 497–499.
Article Google Scholar
Flöckner, H., M. Braxenthaler, P. Lackner, M. Jaritz, M. Ortner and M. J. Sippl (1995). Progress in fold recognition. Proteins: Structure, Function, and Genetics 23, 376–386.
Article Google Scholar
Fraenkel, A.S. (1993). Complexity of protein folding. Bull. Math. Biol. 55, 1199–1210.
Article MATH Google Scholar
Friedrichs, M. S. and P. G. Wolynes (1989). Toward protein tertiary structure recognition by means of associative memory Hamiltonians. Science 246, 371–373.
Google Scholar
Garey, M. R. and D. S. Johnson (1976). Computers and Intractability: A Guide to the Theory of NP-Completeness, New York: W. H. Freeman and Company.
Google Scholar
Goldstein, R. A., Z. A. Luthey-Schulten and P. G. Wolynes (1992). Tertiary structure recognition using optimized Hamiltonians with local interactions. Proc. Natl Acad. Sci. USA 89, 9029–9033.
Article Google Scholar
Greer, J. (1990). Comparative modeling methods: application to the family of the mammalian serine proteases. Proteins: Structure, Function, and Genetics 7, 317–333.
Article Google Scholar
Hartigan, J. A. (1983). Bayes Theory, New York: Springer-Verlag.
MATH Google Scholar
Holm, L. and C. Sander (1994). The FSSP database of structurally aligned protein fold families. Nucl. Acids Res. 22, 3600–3609.
Google Scholar
Holm, L. and C. Sander (1996). Mapping the protein universe. Science 273, 595–602.
Google Scholar
Hunter, L. and D. J. States (1992). Bayesian classification of protein structure. IEEE Expert 7, 67–75.
Article Google Scholar
Jernigan, R. L. and I. Bahar (1996). Structure-derived potentials and protein simulations. Current Opinion in Structural Biol. 6, 195–209.
Article Google Scholar
Jones, D. T., W. R. Taylor and J. M. Thornton (1992). A new approach to protein fold recognition. Nature (London) 358, 86–89.
Article Google Scholar
Jones, D. T. and J. M. Thornton (1993). Protein fold recognition. J. Computer-Aided Mol. Design. 7, 439–456.
Article Google Scholar
Jones, D. T. and J. M. Thornton (1996). Potential energy functions for threading. Current Opinion in Structural Biol. 6, 210–216.
Article Google Scholar
Kolinski, A., J. Skolnick and A. Godzi (1996). An algorithm for prediction of structural elements in small proteins, in Proc. Pacific Symp. on Biocomputing ’96, L. Hunter and T. E. Klein (Eds), Singapore: World Scientific, pp. 446–460.
Google Scholar
Lathrop, R. H. (1994). The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Engng 7, 1059–1068.
Google Scholar
Lathrop, R. H. and T. F, Smith (1996). Global optimum protein threading with gapped alignment and empirical pair score functions. J. Mol. Biol. 255, 641–665.
Article Google Scholar
Lawrence, C. E., S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald and J. C. Wootton (1993). Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.
Google Scholar
Lemer, C. M.-R., M. J. Rooman and S. J. Wodak (1995). Protein structure prediction by threading methods: Evaluation of current techniques. Proteins: Structure, Function, and Genetics 23 337–355.
Article Google Scholar
Lüthy, R., J. U, Bowie and D. Eisenberg (1992). Assessment of protein models with three-dimensional profiles. Nature (London) 356, 83–85.
Article Google Scholar
Madej, T., J.-F. Gibrat, and S. H. Bryant (1995). Threading a database of protein cores. Proteins: Structure, Function, and Genetics 23, 356–369.
Article Google Scholar
Maiorov, V. N. and G. M. Crippen (1994). Learning about protein folding via potential functions. Proteins: Structure, Function, and Genetics 20, 167–173.
Article Google Scholar
Mandal, C., and D. S. Linthicum (1993). PROGEN: An automated modelling algorithm for the generation of complete protein structures from the α-carbon atomic coordinates. J. Computer-aided Mol. Design 7, 199–224.
Article Google Scholar
Moult, J., J. T. Pedersen, R. Judson and K. Fidelis (1995). A large-scale experiment to assess protein structure prediction methods. Proteins: Structure, Function, and Genetics 23, ii–iv.
Article Google Scholar
Murzin, A. G., S. E. Brener, T. Hubbard and C. Chothia (1995). SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.
Article Google Scholar
Ngo, J. T. and J. Marks (1992). Computational complexity of a problem in molecular structure prediction. Protein Engng 5, 313–321.
Google Scholar
Novotný, J., A. A. Rashin and R. E. Bruccoleri (1988). Criteria that discriminate between native proteins and incorrectly folded models. Proteins: Structure, Function, and Genetics 4, 19–30.
Article Google Scholar
Orengo, C. A., D. T. Jones and J. M. Thornton (1994). Protein superfamilies and domain superfolds. Nature (London) 372, 631–634.
Article Google Scholar
Ouzounis, C., C. Sander, M. Scharf and R. Schneider (1993). Prediction of protein structure by evaluation of sequence-structure fitness. J. Mol. Biol. 232, 805–825.
Article Google Scholar
Rabiner, R. L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–285.
Article Google Scholar
Russell, R. B. and G. J. Barton (1994). Structural features can be unconserved in proteins with similar folds. J. Mol. Biol. 244, 332–350.
Article Google Scholar
Sankof, D. and J. B. Kmskal (Eds) (1983). Time Warps, String Edits and Macromolecules, Reading, MA: Addison-Wesley.
Google Scholar
Sippl, M. J. (1993). Boltzmann’s principle, knowledge-based mean fields and protein folding. J. Computer-aided Mol. Design 7, 473–501.
Article Google Scholar
Sippl, M. J. (1995). Knowledge-based potentials for proteins. Current Opinion in Szructural Biol. 5, 229–235.
Article Google Scholar
Sippl, M. J., M. Hendlich and P. Lackner (1992). Assembly of polypeptide and protein backbone conformations from low energy ensembles of short fragments. Protein Sci. 1, 625–640.
Google Scholar
Simons, K. T., C. Kooperberg, E. Huang and D. Baker (1997). Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225.
Article Google Scholar
Skolnick, J., A. Kolinski and A. R. Ortiz (1997). MONSSTER: A method for folding globular proteins with a small number of distance restraints. J. Mol. Biol. 265, 217–241.
Article Google Scholar
Smith, T. F., R. H. Lathrop and F. E. Cohen (1996). The identification of protein functional patterns, in Integrative Approaches to Molecular Biology, J. Collado-Vides, B. Magasanik, B. and T. F. Smith (Eds), Cambridge, MA: MIT Press, pp. 29–61.
Google Scholar
Smith, T. F., L. Lo Conte, J. Bienkowska, R. G. Rogers Jr, C. Gaitatzes and R. H. Lathrop. (1997). The threading approach to the inverse folding problem, in Proc. Int. Conf. on Computational Molecular Biology, S. Istrail, R. Karp, T. Lengauer, P. Pevzner, R. Shamir and M. Waterman (Eds), New York: ACM Press, pp. 287–292
Google Scholar
Smith, T. F., L. Lo Conte, J. Bienkowska, C. Gaitatzes, R. G. Rogers Jr and R. H. Lathrop (1997). Current limitations to protein threading approaches. J. Comp. Biol. 4, 217–225.
Article Google Scholar
Srinivasan, R. and G. D. Rose (1995). LINUS: A hierarchic procedure to predict the fold of a protein. Proteins: Structure, Function, and Genetics 22, 81–99.
Article Google Scholar
Stultz, C. M., R. Nambudripad, R. H. Lathrop and J. V. White (1995) Predicting protein structure with probabilistic models, in Protein Folding and Stability, N. Allewell and C. Woodward (Eds), Greenwich: JAI Press, in press.
Google Scholar
Thomas, P. D. and K. A. Dill (1996). Statistical potentials extracted from protein structures: How accurate are they? J. Mol. Biol. 257, 457–469.
Article Google Scholar
Thompson, M. J. and R. A. Goldstein (1996). Predicting solvent accessibilities: Higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins: Structure, Function, and Genetics 25, 38–47.
Article Google Scholar
Unger, R. and J. Moult (1993). Finding the lowest free energy conformation of a protein is an NP-hard problem: Proof and implications. Bull. Math. Biol. 55, 1183–1198.
Article MATH Google Scholar
Weiner, S. J., P. A. Kollman, D. A. Case, U. C. Singh, C. Ghio, G. Alagona, S. Profeta and P. Weiner (1984). A new force field for molecular mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc. 106, 765–784.
Article Google Scholar
White, J., I. Muchnik and T. F. Smith (1994). Modeling protein cores with Markov random fields. Math. Biosci. 124, 149–179.
Article MATH Google Scholar
White, J. V., C. M. Stultz and T. F. Smith (1994). Protein classification by state-space modeling and optimal filtering of amino-acid sequences. Math. Biosci. 191, 35–75.
Article Google Scholar
Wilbur, W. J., F. Major, J. Spouge and S. Bryant (1996). The statistics of unique native states for random peptides. Biopolymers 38 447–459.
Article Google Scholar
Wilmanns, M. and D. Eisenberg (1993). Three-dimensional profiles from residue-pair preferences: Identification of sequences with β/α-barrel fold. Proc. Natl Acad. Sci. USA 90, 1379–1383.
Article Google Scholar
Wodak, S. J. and M. J. Rooman (1993). Generating and testing protein folds. Current Opinion in Structural Biol. 3, 247–259.
Article Google Scholar
Xu, Y. and C. E. Uberbacher (1996). A polynomial-time algorithm for a class of protein threading problems. CABIOS 12, 511–517.
Google Scholar
Xu, Y., D. Xu and C. E. Uberbacher (1998). A new method for modeling and solving the protein fold recognition problem, in Proc. Int. Conf. on Computational Molecular Biology, S. Istrail, R. Karp, T. Lengauer, P. Pevzner, R. Shamir and M. Waterman (Eds), New York: ACM Press, pp. 285–292.
Google Scholar
Zheng, Q., R. Rosenfeld, S. Vajda and C. DeLisi (1993). Determining protein loop conformation using scaling-relaxation techniques. Protein Sci. 2, 1242–1248.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information and Computer Science, University of California, Irvine, CA, 92717, USA
Richard H. Lathrop
BioMolecular Engineering Research Center, Boston University, 36 Cummington Street, Boston, MA, 02215, USA
Robert G. Rogers Jr & Temple F. Smith
TASC, 55 Walkers Brook Drive, Reading, MA, 01867, USA
James V. White

Authors

Richard H. Lathrop
View author publications
You can also search for this author in PubMed Google Scholar
Robert G. Rogers Jr
View author publications
You can also search for this author in PubMed Google Scholar
Temple F. Smith
View author publications
You can also search for this author in PubMed Google Scholar
James V. White
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard H. Lathrop.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lathrop, R.H., Rogers, R.G., Smith, T.F. et al. A Bayes-optimal sequence-structure theory that unifies protein sequence-structure recognition and alignment. Bull. Math. Biol. 60, 1039–1071 (1998). https://doi.org/10.1006/S0092-8240(98)90002-7

Download citation

Received: 24 July 1997
Accepted: 22 July 1998
Issue Date: November 1998
DOI: https://doi.org/10.1006/S0092-8240(98)90002-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Bayes-optimal sequence-structure theory that unifies protein sequence-structure recognition and alignment

Abstract

Access this article

Similar content being viewed by others

Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection

Bayesian Multiple Protein Structure Alignment

ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Bayes-optimal sequence-structure theory that unifies protein sequence-structure recognition and alignment

Abstract

Access this article

Similar content being viewed by others

Profiles of Natural and Designed Protein-Like Sequences Effectively Bridge Protein Sequence Gaps: Implications in Distant Homology Detection

Bayesian Multiple Protein Structure Alignment

ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation