Skip to main content
Log in

A Bayes-optimal sequence-structure theory that unifies protein sequence-structure recognition and alignment

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

A rigorous Bayesian analysis is presented that unifies protein sequence-structure alignment and recognition. Given a sequence, explicit formulae are derived to select (1) its globally most probable core structure from a structure library; (2) its globally most probable alignment to a given core structure; (3) its most probable joint core structure and alignment chosen globally across the entire library; and (4) its most probable individual segments, secondary structure, and super-secondary structures across the entire library. The computations involved are NP-hard in the general case (3D-3D). Fast exact recursions for the restricted sequence singleton-only (1D-3D) case are given. Conclusions include: (a) the most probable joint core structure and alignment is not necessarily the most probable alignment of the most probable core structure, but rather maximizes the product of core and alignment probabilities; (b) use of a sequence-independent linear or affine gap penalty may result in the highest-probability threading not having the lowest score; (c) selecting the most probable core structure from the library (core structure selection or fold recognition only) involves comparing probabilities summed over all possible alignments of the sequence to the core, and not comparing individual optimal (or near-optimal) sequence-structure alignments; and (d) assuming uninformative priors, core structure selection is equivalent to comparing the ratio of two global means.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akutsu, T. and S. Miyano (1997). On the approximation of protein threading, in Proc. Int. Conf. on Computational Molecular Biology, S. Istrail, R. Karp, T. Lengauer, P. Pevzner, R. Shamir and M. Waterman (Eds), New York: ACM Press, pp. 3–8.

    Google Scholar 

  • Akutsu, T. and H. Tashimo (1998). Linear programming based approach to the derivation of a contact potential for protein threading, in Proc. Pacific Symp. on Biocomputing ’98, R. B. Altman, A. K. Dunker, L. Hunter and T. E. Klein (Eds), Singapore: World Scientific, pp. 413–424.

    Google Scholar 

  • Arnold, G. E., A. K. Dunker, S. J. Johns and R. J. Douthart (1992). Use of conditional probabilities for determining relationships between amino acid sequence and protein secondary structure. Proteins: Structure, Function, and Genetics 12, 382–399.

    Article  Google Scholar 

  • Bayes, T. (1764). An essay toward solving a problem in the doctrine of chances. Phil. Trans. Royal Soc. London 53, 370–418. Reprinted (1970) in Studies in the History of Statistics and Probability, E. S. Pearson and M. G. Kendall (Eds), London: Charles Griffin, London, pp. 131–153.

    Google Scholar 

  • Benner, S. A., M. A. Cohen and G. H. Gonnet (1993). Empirical and structural models for insertions and deletions in the divergent evolution of proteins. J. Mol. Biol. 229, 1065–1082.

    Article  Google Scholar 

  • Bowie, J. and D. Eisenberg (1993). Inverted protein structure prediction. Current Opinion in Structural Biol. 3, 437–444.

    Article  Google Scholar 

  • Bowie, F. U., R. Lüthy and D. Eisenberg (1991). A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164–170.

    Google Scholar 

  • Box, G. E. and G. C. Tiao (1973). Bayesian Inference in Statistical Analysis, Reading, MA: Addison-Wesley.

    MATH  Google Scholar 

  • Brooks, C. L., M. Karplus and B. M. Pettitt (1990). Proteins: A Theoretical Perspective of Dynamics, Structure, and Thermodynamics, New York: John Wiley and Sons.

    Google Scholar 

  • Bryant, S. H. and S. F. Altschul (1995). Statistics of sequence-structure threading. Current Opinion in Structural Biol. 5, 236–244.

    Article  Google Scholar 

  • Bryant, S. H. and C. E. Lawrence (1993). An empirical energy function for threading protein sequence through the folding motif. Proteins: Structure, Function, and Genetics 16, 92–112.

    Article  Google Scholar 

  • Crippen, G. M. (1996). Failures of inverse folding and threading with gapped alignment. Proteins 26, 167–71.

    Article  Google Scholar 

  • Desmet, J., M. De Maeyer, B. Hazes and I. Lasters (1992). The dead-end elimination theorem and its use in protein side-chain positioning. Nature (London) 356, 539–542.

    Article  Google Scholar 

  • Dill, K. A., S. Bromberg, K. Yue, K. M. Fiebig, D. P. Yee, P. D. Thomas and H. S. Chan (1995). Principles of protein folding—a perspective from simple exact models. Protein Science 4, 561–602.

    Article  Google Scholar 

  • Dunbrack Jr, R. L. and F. E. Cohen (1997). Bayesian statistical analysis of protein sidechain rotamer preferences, Protein Science 6, 1661–1681.

    Google Scholar 

  • Fetrow, J. S. and S. H. Bryant (1993). New programs for protein tertiary structure prediction. Bio/Technology 11, 479–484.

    Article  Google Scholar 

  • Finkelstein, A.V., A. Y. Badretdinov and A. M. Gutin (1995). Why do proteins have Boltzmann-like statistics? Proteins: Structure, Function, and Genetics 23, 142–150.

    Article  Google Scholar 

  • Finkelstein, A. V. and B. Reva (1991). A search for the most stable folds of protein chains. Nature (London) 351, 497–499.

    Article  Google Scholar 

  • Flöckner, H., M. Braxenthaler, P. Lackner, M. Jaritz, M. Ortner and M. J. Sippl (1995). Progress in fold recognition. Proteins: Structure, Function, and Genetics 23, 376–386.

    Article  Google Scholar 

  • Fraenkel, A.S. (1993). Complexity of protein folding. Bull. Math. Biol. 55, 1199–1210.

    Article  MATH  Google Scholar 

  • Friedrichs, M. S. and P. G. Wolynes (1989). Toward protein tertiary structure recognition by means of associative memory Hamiltonians. Science 246, 371–373.

    Google Scholar 

  • Garey, M. R. and D. S. Johnson (1976). Computers and Intractability: A Guide to the Theory of NP-Completeness, New York: W. H. Freeman and Company.

    Google Scholar 

  • Goldstein, R. A., Z. A. Luthey-Schulten and P. G. Wolynes (1992). Tertiary structure recognition using optimized Hamiltonians with local interactions. Proc. Natl Acad. Sci. USA 89, 9029–9033.

    Article  Google Scholar 

  • Greer, J. (1990). Comparative modeling methods: application to the family of the mammalian serine proteases. Proteins: Structure, Function, and Genetics 7, 317–333.

    Article  Google Scholar 

  • Hartigan, J. A. (1983). Bayes Theory, New York: Springer-Verlag.

    MATH  Google Scholar 

  • Holm, L. and C. Sander (1994). The FSSP database of structurally aligned protein fold families. Nucl. Acids Res. 22, 3600–3609.

    Google Scholar 

  • Holm, L. and C. Sander (1996). Mapping the protein universe. Science 273, 595–602.

    Google Scholar 

  • Hunter, L. and D. J. States (1992). Bayesian classification of protein structure. IEEE Expert 7, 67–75.

    Article  Google Scholar 

  • Jernigan, R. L. and I. Bahar (1996). Structure-derived potentials and protein simulations. Current Opinion in Structural Biol. 6, 195–209.

    Article  Google Scholar 

  • Jones, D. T., W. R. Taylor and J. M. Thornton (1992). A new approach to protein fold recognition. Nature (London) 358, 86–89.

    Article  Google Scholar 

  • Jones, D. T. and J. M. Thornton (1993). Protein fold recognition. J. Computer-Aided Mol. Design. 7, 439–456.

    Article  Google Scholar 

  • Jones, D. T. and J. M. Thornton (1996). Potential energy functions for threading. Current Opinion in Structural Biol. 6, 210–216.

    Article  Google Scholar 

  • Kolinski, A., J. Skolnick and A. Godzi (1996). An algorithm for prediction of structural elements in small proteins, in Proc. Pacific Symp. on Biocomputing ’96, L. Hunter and T. E. Klein (Eds), Singapore: World Scientific, pp. 446–460.

    Google Scholar 

  • Lathrop, R. H. (1994). The protein threading problem with sequence amino acid interaction preferences is NP-complete. Protein Engng 7, 1059–1068.

    Google Scholar 

  • Lathrop, R. H. and T. F, Smith (1996). Global optimum protein threading with gapped alignment and empirical pair score functions. J. Mol. Biol. 255, 641–665.

    Article  Google Scholar 

  • Lawrence, C. E., S. F. Altschul, M. S. Boguski, J. S. Liu, A. F. Neuwald and J. C. Wootton (1993). Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214.

    Google Scholar 

  • Lemer, C. M.-R., M. J. Rooman and S. J. Wodak (1995). Protein structure prediction by threading methods: Evaluation of current techniques. Proteins: Structure, Function, and Genetics 23 337–355.

    Article  Google Scholar 

  • Lüthy, R., J. U, Bowie and D. Eisenberg (1992). Assessment of protein models with three-dimensional profiles. Nature (London) 356, 83–85.

    Article  Google Scholar 

  • Madej, T., J.-F. Gibrat, and S. H. Bryant (1995). Threading a database of protein cores. Proteins: Structure, Function, and Genetics 23, 356–369.

    Article  Google Scholar 

  • Maiorov, V. N. and G. M. Crippen (1994). Learning about protein folding via potential functions. Proteins: Structure, Function, and Genetics 20, 167–173.

    Article  Google Scholar 

  • Mandal, C., and D. S. Linthicum (1993). PROGEN: An automated modelling algorithm for the generation of complete protein structures from the α-carbon atomic coordinates. J. Computer-aided Mol. Design 7, 199–224.

    Article  Google Scholar 

  • Moult, J., J. T. Pedersen, R. Judson and K. Fidelis (1995). A large-scale experiment to assess protein structure prediction methods. Proteins: Structure, Function, and Genetics 23, ii–iv.

    Article  Google Scholar 

  • Murzin, A. G., S. E. Brener, T. Hubbard and C. Chothia (1995). SCOP: A structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247, 536–540.

    Article  Google Scholar 

  • Ngo, J. T. and J. Marks (1992). Computational complexity of a problem in molecular structure prediction. Protein Engng 5, 313–321.

    Google Scholar 

  • Novotný, J., A. A. Rashin and R. E. Bruccoleri (1988). Criteria that discriminate between native proteins and incorrectly folded models. Proteins: Structure, Function, and Genetics 4, 19–30.

    Article  Google Scholar 

  • Orengo, C. A., D. T. Jones and J. M. Thornton (1994). Protein superfamilies and domain superfolds. Nature (London) 372, 631–634.

    Article  Google Scholar 

  • Ouzounis, C., C. Sander, M. Scharf and R. Schneider (1993). Prediction of protein structure by evaluation of sequence-structure fitness. J. Mol. Biol. 232, 805–825.

    Article  Google Scholar 

  • Rabiner, R. L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–285.

    Article  Google Scholar 

  • Russell, R. B. and G. J. Barton (1994). Structural features can be unconserved in proteins with similar folds. J. Mol. Biol. 244, 332–350.

    Article  Google Scholar 

  • Sankof, D. and J. B. Kmskal (Eds) (1983). Time Warps, String Edits and Macromolecules, Reading, MA: Addison-Wesley.

    Google Scholar 

  • Sippl, M. J. (1993). Boltzmann’s principle, knowledge-based mean fields and protein folding. J. Computer-aided Mol. Design 7, 473–501.

    Article  Google Scholar 

  • Sippl, M. J. (1995). Knowledge-based potentials for proteins. Current Opinion in Szructural Biol. 5, 229–235.

    Article  Google Scholar 

  • Sippl, M. J., M. Hendlich and P. Lackner (1992). Assembly of polypeptide and protein backbone conformations from low energy ensembles of short fragments. Protein Sci. 1, 625–640.

    Google Scholar 

  • Simons, K. T., C. Kooperberg, E. Huang and D. Baker (1997). Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268, 209–225.

    Article  Google Scholar 

  • Skolnick, J., A. Kolinski and A. R. Ortiz (1997). MONSSTER: A method for folding globular proteins with a small number of distance restraints. J. Mol. Biol. 265, 217–241.

    Article  Google Scholar 

  • Smith, T. F., R. H. Lathrop and F. E. Cohen (1996). The identification of protein functional patterns, in Integrative Approaches to Molecular Biology, J. Collado-Vides, B. Magasanik, B. and T. F. Smith (Eds), Cambridge, MA: MIT Press, pp. 29–61.

    Google Scholar 

  • Smith, T. F., L. Lo Conte, J. Bienkowska, R. G. Rogers Jr, C. Gaitatzes and R. H. Lathrop. (1997). The threading approach to the inverse folding problem, in Proc. Int. Conf. on Computational Molecular Biology, S. Istrail, R. Karp, T. Lengauer, P. Pevzner, R. Shamir and M. Waterman (Eds), New York: ACM Press, pp. 287–292

    Google Scholar 

  • Smith, T. F., L. Lo Conte, J. Bienkowska, C. Gaitatzes, R. G. Rogers Jr and R. H. Lathrop (1997). Current limitations to protein threading approaches. J. Comp. Biol. 4, 217–225.

    Article  Google Scholar 

  • Srinivasan, R. and G. D. Rose (1995). LINUS: A hierarchic procedure to predict the fold of a protein. Proteins: Structure, Function, and Genetics 22, 81–99.

    Article  Google Scholar 

  • Stultz, C. M., R. Nambudripad, R. H. Lathrop and J. V. White (1995) Predicting protein structure with probabilistic models, in Protein Folding and Stability, N. Allewell and C. Woodward (Eds), Greenwich: JAI Press, in press.

    Google Scholar 

  • Thomas, P. D. and K. A. Dill (1996). Statistical potentials extracted from protein structures: How accurate are they? J. Mol. Biol. 257, 457–469.

    Article  Google Scholar 

  • Thompson, M. J. and R. A. Goldstein (1996). Predicting solvent accessibilities: Higher accuracy using Bayesian statistics and optimized residue substitution classes. Proteins: Structure, Function, and Genetics 25, 38–47.

    Article  Google Scholar 

  • Unger, R. and J. Moult (1993). Finding the lowest free energy conformation of a protein is an NP-hard problem: Proof and implications. Bull. Math. Biol. 55, 1183–1198.

    Article  MATH  Google Scholar 

  • Weiner, S. J., P. A. Kollman, D. A. Case, U. C. Singh, C. Ghio, G. Alagona, S. Profeta and P. Weiner (1984). A new force field for molecular mechanical simulation of nucleic acids and proteins. J. Am. Chem. Soc. 106, 765–784.

    Article  Google Scholar 

  • White, J., I. Muchnik and T. F. Smith (1994). Modeling protein cores with Markov random fields. Math. Biosci. 124, 149–179.

    Article  MATH  Google Scholar 

  • White, J. V., C. M. Stultz and T. F. Smith (1994). Protein classification by state-space modeling and optimal filtering of amino-acid sequences. Math. Biosci. 191, 35–75.

    Article  Google Scholar 

  • Wilbur, W. J., F. Major, J. Spouge and S. Bryant (1996). The statistics of unique native states for random peptides. Biopolymers 38 447–459.

    Article  Google Scholar 

  • Wilmanns, M. and D. Eisenberg (1993). Three-dimensional profiles from residue-pair preferences: Identification of sequences with β/α-barrel fold. Proc. Natl Acad. Sci. USA 90, 1379–1383.

    Article  Google Scholar 

  • Wodak, S. J. and M. J. Rooman (1993). Generating and testing protein folds. Current Opinion in Structural Biol. 3, 247–259.

    Article  Google Scholar 

  • Xu, Y. and C. E. Uberbacher (1996). A polynomial-time algorithm for a class of protein threading problems. CABIOS 12, 511–517.

    Google Scholar 

  • Xu, Y., D. Xu and C. E. Uberbacher (1998). A new method for modeling and solving the protein fold recognition problem, in Proc. Int. Conf. on Computational Molecular Biology, S. Istrail, R. Karp, T. Lengauer, P. Pevzner, R. Shamir and M. Waterman (Eds), New York: ACM Press, pp. 285–292.

    Google Scholar 

  • Zheng, Q., R. Rosenfeld, S. Vajda and C. DeLisi (1993). Determining protein loop conformation using scaling-relaxation techniques. Protein Sci. 2, 1242–1248.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard H. Lathrop.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lathrop, R.H., Rogers, R.G., Smith, T.F. et al. A Bayes-optimal sequence-structure theory that unifies protein sequence-structure recognition and alignment. Bull. Math. Biol. 60, 1039–1071 (1998). https://doi.org/10.1006/S0092-8240(98)90002-7

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1006/S0092-8240(98)90002-7

Keywords

Navigation