A Practical Guide to Protein Structure Prediction

  • David T. Jones
Part of the Methods in Molecular Biology™ book series (MIMB, volume 143)


The protein-folding problem is one of the greatest remaining challenges in structural molecular biology (if not the whole of biology). How do proteins translate from their primary structure (sequence) to tertiary structure? How is the information encoded? Basically, how do proteins fold? Often, the protein-folding problem is seen as a computational problem—do we know enough about the rules of protein structure to program a computer to read in a protein sequence and output a correct tertiary structure? Aside from the academic interest in understanding the physics and chemistry of protein folding, why are so many people interested in finding an algorithm (i.e., a method) for predicting the native structure of a protein given just its sequence?


Protein Structure Prediction Fold Recognition Folding Simulation CASP2 Experiment Sequence Databank 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Lattman, E. E. (1995) Protein structure prediction: a special issue. Proteins 23, 295–460.CrossRefGoogle Scholar
  2. 2.
    Read, J., Brayer, G., Jurek, L., and James, M. N. G. (1984) Critical evaluation of comparative model building of Streptomyces griseus trypsin. Biochemistry 23, 6570–6575.PubMedCrossRefGoogle Scholar
  3. 3.
    Greer, J. (1990) Comparative model building methods: application to the family of the mammalian serine proteases. Proteins 7, 317–334.PubMedCrossRefGoogle Scholar
  4. 4.
    Vásquez M. (1996) Modeling side chain conformation. Curr. Opin. Struct. Biol. 6, 217–221.PubMedCrossRefGoogle Scholar
  5. 5.
    Chung, S. Y. and Subbiah, S. (1996) How similar must a template protein be for homology modeling by side-chain packing methods, in Proceedings of the First Pacific Symposium on Biocomputing: 1996 Jan 2–6; Kona, HI. (Hunter, L. and Klein, T., eds.) World Scientific, Singapore, pp. 126–141.Google Scholar
  6. 6.
    Moult, J. and James, M. N. G. (1986) An algorithm for determining the conformation of polypeptide segments in proteins by systematic search. Proteins 1, 146–163.PubMedCrossRefGoogle Scholar
  7. 7.
    Bruccoleri, R. E. and Karplus, M. (1987) Prediction of the folding of short polypep-tide segments by uniform conformation sampling. Biopolymers 26, 137–168.PubMedCrossRefGoogle Scholar
  8. 8.
    Laskowski, R. A., MacArthur, M. W., Moss, D., and Thornton, J. M. (1993) PROCHECK, a program to check the stereochemical quality of protein structures. J. Appl. Cryst. 26, 283–291.CrossRefGoogle Scholar
  9. 9.
    Hooft, R. W. W., Sander, C., and Vriend, G. (1997) Objectively judging the quality of a protein structure from a Ramachandran plot. CABIOS 13, 425–430.PubMedGoogle Scholar
  10. 10.
    Peitsch, M. C. (1996) PROMOD and SWISS-MODEL-Internet-based tools for automated comparative protein modeling. Biochem. Soc. Trans. 24, 274–279.PubMedGoogle Scholar
  11. 11.
    Bowie, J. U., Lüthy, R., and Eisenberg, D. (1991) A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164–170.PubMedCrossRefGoogle Scholar
  12. 12.
    Holm, L. and Sander, C. (1992) Evaluation of protein models by atomic solvation preference. J. Mol. Biol. 225, 93–105.PubMedCrossRefGoogle Scholar
  13. 13.
    Luthardt, G. and Frommel, C. (1994) Local polarity analysis: a sensitive method that discriminates between native proteins and incorrectly folded models. Protein Eng. 7, 627–631.PubMedCrossRefGoogle Scholar
  14. 14.
    Hendlich, M., Lackner, P., Weitckus, S., Floeckner, H., Froschauer, R., Gottsbacher, K., Casari, G., and Sippl, M. J. (1990) Identification of native protein folds amongst a large number of incorrect models: the calculation of low energy conformations from potentials of mean force. J. Mol. Biol. 216, 167–180.PubMedCrossRefGoogle Scholar
  15. 15.
    Jones, D. T., Taylor, W. R., and Thornton, J. M. (1992) A new approach to protein fold recognition. Nature. 358, 86–89.PubMedCrossRefGoogle Scholar
  16. 16.
    Sippl, M. J. and Weitckus, S. (1992) Detection of native-like models for amino acid sequences of unknown three-dimensional structure in a data base of known protein conformations. Proteins 13, 258–271.PubMedCrossRefGoogle Scholar
  17. 17.
    Godzik, A. and Skolnick, J. (1992) Sequence-structure matching in globular proteins: application to supersecondary and tertiary structure determination. Proc. Natl. Acad. Sci. U. S. A. 89, 12,098–12,102.PubMedCrossRefGoogle Scholar
  18. 18.
    Maiorov, V. N. and Crippen, G. M. (1992) Contact potential that recognizes the correct folding of globular proteins. J. Mol. Biol. 227, 876–888.PubMedCrossRefGoogle Scholar
  19. 19.
    Bryant, S. H. and Lawrence, C. E. (1993) An empirical energy function for threading protein-sequence through the folding motif. Proteins: Struct. Funct. Genet. 16, 92–112.CrossRefGoogle Scholar
  20. 20.
    Ouzounis, C., Sander, C., Scharf, M., and Schneider, R. (1993) Prediction of protein structure by evaluation of sequence-structure fitness. Aligning sequences to contact profiles derived from three-dimensional structures. J. Mol. Biol. 232, 805–825.PubMedCrossRefGoogle Scholar
  21. 21.
    Abagyan, R., Frishman, D., and Argos, P. (1994) Recognition of distantly related proteins through energy calculations. Proteins: Struct. Funct. Genet. 19, 132–140.CrossRefGoogle Scholar
  22. 22.
    Overington, J., Donnelly, D., Johnson, M. S., Sali, A., and Blundell, T. L. (1992) Environment-specific amino-acid substitution tables—tertiary templates and prediction of protein folds. Prot. Sci. 1, 216–226.CrossRefGoogle Scholar
  23. 23.
    Garnier, J., Gibrat, J. F., and Robson, B. (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol. 266, 540–553.PubMedCrossRefGoogle Scholar
  24. 24.
    Chou, P. Y. and Fasman, G. D. (1978) Prediction of the secondary structure of proteins from their amino RT acid sequence. Adv. Enzymol. 47, 45–148.PubMedGoogle Scholar
  25. 25.
    Barton, G. J. (1995) Protein secondary structure prediction. Curr. Opin. Struct. Biol. 5, 372–376.PubMedCrossRefGoogle Scholar
  26. 26.
    Rost, B. and Sander, C. (1995) Progress of 1D protein structure prediction at last. Proteins 23, 295–300.PubMedCrossRefGoogle Scholar
  27. 27.
    Eisenberg, D. (1997) Into the black of night. Nat. Struct. Biol. 4, 95–97.PubMedCrossRefGoogle Scholar
  28. 28.
    King, R. D. and Sternberg, M. J. E. (1996) Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Prot. Sci. 5, 2298–2310.CrossRefGoogle Scholar
  29. 29.
    Gobel, U., Sander, C., Schneider, R., and Valencia, A. (1994) Correlated mutations and residue contacts in proteins. Proteins 18, 309–317.PubMedCrossRefGoogle Scholar
  30. 30.
    Shindyalov, I. N., Kolchanov, N. A., and Sander, C. (1994) Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 7, 349–358.PubMedCrossRefGoogle Scholar
  31. 31.
    Taylor, W. R. and Hatrick, K. (1994) Compensating changes in protein multiple sequence alignments. Protein Eng. 7, 341–348.PubMedCrossRefGoogle Scholar
  32. 32.
    Thomas, D. J., Casari, G., and Sander, C. (1996) The prediction of protein contacts from multiple sequence alignments. Protein Eng. 9, 941–948.PubMedCrossRefGoogle Scholar
  33. 33.
    Kolinski, A. and Skolnick, J. (1994) Monte Carlo simulations of protein folding. II. Application to protein A, ROP, and crambin. Proteins: Struct. Funct. Genet. 18, 353–366.CrossRefGoogle Scholar
  34. 34.
    Kolinski, A. and Skolnick, J. (1994) Monte Carlo simulations of protein folding. I. Lattice model and interaction scheme. Proteins: Struct. Funct. Genet. 18, 338–352.CrossRefGoogle Scholar
  35. 35.
    Yee, D. P., Chan, H. S., Havel, T. F., and Dill, K. A. (1994) Does compactness induce secondary structure in proteins? A study of poly-alanine chains computed by distance geometry. J. Mol. Biol. 241, 557–573.PubMedCrossRefGoogle Scholar
  36. 36.
    Hinds, D. A. and Levitt, M. (1994) Exploring conformational space with a simple lattice model for protein structure. J. Mol. Biol. 243, 668–682.PubMedCrossRefGoogle Scholar
  37. 37.
    Yue, K., Fiebig, K. M., Thomas, P. D., Hue Sun, Chan, Shakhnovich, E. I., and Dill, K. A. (1995) A test of lattice protein folding algorithms. Proc. Natl. Acad. Sci. USA 92, 325–329.Google Scholar
  38. 38.
    Dill, K. A., Bromberg, S., Yue, K., Fiebig, K. M., Yee, D. P., Thomas, P. D., and Hue Sun, Chan (1995) Principles of protein folding—a perspective from simple exact models. Prot. Sci. 4, 561–602.Google Scholar
  39. 39.
    Park, B. H. and Levitt, M. (1995) The complexity and accuracy of discrete state models of protein structure. J. Mol. Biol. 249, 493–507.PubMedCrossRefGoogle Scholar
  40. 40.
    Abkevich, V. I., Gutin, A. M., and Shakhnovich, E. I. (1995) Domains in folding of model proteins. Prot. Sci. 4, 1167–1177.CrossRefGoogle Scholar
  41. 41.
    Rykunov, D. S., Reva, B. A., and Finkelstein, A. V. (1995) Accurate general method for lattice approximation of three-dimensional structure of a chain molecule. Proteins: Struct. Funct. Genet. 22, 100–109.CrossRefGoogle Scholar
  42. 42.
    Dewitte, R. S., Michnick, S. W., and Shakhnovich, E. I. (1995) Exhaustive enumeration of protein conformations using experimental restraints. Prot. Sci. 4, 1780–1791.CrossRefGoogle Scholar
  43. 43.
    Covell, D. G. (1994) Lattice model simulations of polypeptide chain folding. J. Mol. Biol. 235, 1032–1043.PubMedCrossRefGoogle Scholar
  44. 44.
    Srinivasan, R. and Rose, G. D. (1995) Linus-a hierarchical procedure to predict the fold of a protein. Proteins 22, 81–99.PubMedCrossRefGoogle Scholar
  45. 45.
    Dandekar, T. and Argos, P. (1994) Folding the main chain of small proteins with the genetic algorithm. J. Mol. Biol. 236, 844–861.PubMedCrossRefGoogle Scholar
  46. 46.
    Sun, S. (1995) A genetic algorithm that seeks native states of peptides and proteins. Biophys. J. 69, 340–355.PubMedCrossRefGoogle Scholar
  47. 47.
    Pederson, J. T. and Moult, J. (1995) Ab initio structure prediction for small polypep-tides and protein fragments using genetic algorithms. Proteins 23, 454–460.CrossRefGoogle Scholar
  48. 48.
    Pedersen, J. T. and Moult, J. (1996) Genetic algorithms for protein structure prediction. Curr. Opin. Struct. Biol. 6, 227–231.PubMedCrossRefGoogle Scholar
  49. 49.
    Aszodi, A. and Taylor, W. R. (1996) Homology modelling by distance geometry. Fold. Des. 1, 325–334.PubMedCrossRefGoogle Scholar
  50. 50.
    Skolnick, J., Kolinski, A., and Ortiz, A. R. (1997) MONSSTER: a method for folding globular proteins with a small number of distance restraints. J. Mol. Biol. 265, 217–241.PubMedCrossRefGoogle Scholar
  51. 51.
    Pearson, W. R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98.PubMedCrossRefGoogle Scholar
  52. 52.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J. H., Zhang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.PubMedCrossRefGoogle Scholar
  53. 53.
    Smith, T. F. and Waterman, M. S. (1981) Comparison of bio-sequences. Adv. Appl. Math. 2, 482–489.CrossRefGoogle Scholar
  54. 54.
    Gribskov, M., Lüthy, R., and Eisenberg, D. (1990) Meth. Enzymol. 188, 146–159.CrossRefGoogle Scholar
  55. 55.
    Krogh A., Brown M., Mian I. S., Sjoelander K., and Haussler D. (1994) Hidden Markov model in computational biology. Applications to protein modelling. J. Mol. Biol. 235, 1501–1531.PubMedCrossRefGoogle Scholar
  56. 56.
    Altschul S. F., Gish W., Miller W., Myers E. W., and Lipman D. J. (1990) Basic local alignment search tool. J. Mol. Biol. 215, 403–410.PubMedGoogle Scholar
  57. 57.
    Abola, E. E., Bernstein, F. C., Bryant, S. H., Koetzle, T. F., and Weng, J. (1987) Protein Data Bank, in Crystallographic Databases, Data Commission of the International Union of Crystallography, Bonn/Cambridge/Chester, pp. 107–132.Google Scholar
  58. 58.
    Jones, D. T., Miller, R. T., and Thornton, J. M. (1995) Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing. Proteins 23, 387–397.PubMedCrossRefGoogle Scholar
  59. 59.
    Miller, R. T., Jones, D. T., and Thornton, J. M. (1996) Protein fold recognition by sequence threading—tools and assessment techniques. FASEB J. 10, 171–178.PubMedGoogle Scholar
  60. 60.
    Orengo, C. A., Jones, D. T., and Thornton, J. M. (1994) Protein superfamilies and domain superfolds. Nature 372, 631–634.PubMedCrossRefGoogle Scholar
  61. 61.
    Edwards, Y. J. K. and Perkins, S. J. (1996) Assessment of protein fold predictions from sequence information: the pedicted alpha/beta doubly wound fold of the von Willebrand factor ype a domain is similar to its crystal structure. J. Mol. Biol. 260, 277–285.PubMedCrossRefGoogle Scholar
  62. 62.
    Lemer, C. M. R., Rooman, M. J., and Wodak, S. J. (1995) Protein structure prediction by threading methods: evaluation of current techniques. Proteins 23, 337–355.PubMedCrossRefGoogle Scholar
  63. 63.
    Burmeister, W. P., Henrissat, B., Bosso, C., Cusack, S., and Ruigrok, R. W. (1993) Influenza B virus neuraminidase can synthesize its own inhibitor. Structure 1, 19–26.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press Inc. 2000

Authors and Affiliations

  • David T. Jones
    • 1
  1. 1.Department of Biological SciencesUniversity of WarwickConventryUK

Personalised recommendations