Protein Secondary Structure Prediction

  • Walter Pirovano
  • Jaap Heringa
Part of the Methods in Molecular Biology book series (MIMB, volume 609)


While the prediction of a native protein structure from sequence continues to remain a challenging problem, over the past decades computational methods have become quite successful in exploiting the mechanisms behind secondary structure formation. The great effort expended in this area has resulted in the development of a vast number of secondary structure prediction methods. Especially the combination of well-optimized/sensitive machine-learning algorithms and inclusion of homologous sequence information has led to increased prediction accuracies of up to 80%. In this chapter, we will first introduce some basic notions and provide a brief history of secondary structure prediction advances. Then a comprehensive overview of state-of-the-art prediction methods will be given. Finally, we will discuss open questions and challenges in this field and provide some practical recommendations for the user.

Key words

secondary structure secondary structure prediction multiple sequence alignment 


  1. 1.
    Pauling, L., Corey R. B., Branson, H. R. (1951) The structure of proteins; two hydrogen-bonded helical configurations of the polypeptide chain. Proc Natl Acad Sci USA 37, 205–211.CrossRefPubMedGoogle Scholar
  2. 2.
    Pauling, L., Corey, R. B. (1951) Configurations of polypeptide chains with favored orientations around single bonds: two new pleated sheets. Proc Natl Acad Sci USA 37, 729–740.CrossRefPubMedGoogle Scholar
  3. 3.
    Goldenberg, D. P., Frieden, R. W., Haack, J. A., Morrison, T. B. (1989) Mutational analysis of a protein-folding pathway. Nature 338, 127–132.CrossRefPubMedGoogle Scholar
  4. 4.
    Berman, H. M., et al. (2000) The protein data bBank. Nucl Acids Res 28, 235–242.CrossRefPubMedGoogle Scholar
  5. 5.
    Russell, R. B., Copley, R. R., Barton, G. J. (1996) Protein fold recognition by mapping predicted secondary structures. J Mol Biol 259, 349–365.CrossRefPubMedGoogle Scholar
  6. 6.
    Rost, B., Schneider, R., Sander, C. (1997) Protein fold recognition by prediction-based threading. J Mol Biol 270, 471–480.CrossRefPubMedGoogle Scholar
  7. 7.
    Koretke, K. K., Russell, R. B., Copley, R. R., Lupas, A. N. (1999) Fold recognition using sequence and secondary structure information. Proteins Suppl 3, 141–148.CrossRefPubMedGoogle Scholar
  8. 8.
    Zhou, H., Zhou, Y. (2004) Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins 55, 1005–1013.CrossRefPubMedGoogle Scholar
  9. 9.
    Jones, D. T. (1999) GenTHREADER: an efficient and reliable protein fold recognition method for genomic sequences. J Mol Biol 287, 797–815.CrossRefPubMedGoogle Scholar
  10. 10.
    Skolnick, J., Kolinski, A., Ortiz, A. R. (1997) MONSSTER: a method for folding globular proteins with a small number of distance restraints. J Mol Biol 265, 217–241.CrossRefPubMedGoogle Scholar
  11. 11.
    Hargbo, J., Elofsson, A. (1999) Hidden Markov models that use predicted secondary structures for fold recognition. Proteins 36, 68–76.CrossRefPubMedGoogle Scholar
  12. 12.
    Soding, J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951–960.CrossRefPubMedGoogle Scholar
  13. 13.
    Simossis, V. A., Heringa, J. (2005) PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucl Acids Res 33, W289–W294.CrossRefPubMedGoogle Scholar
  14. 14.
    Zhou, H., Zhou, Y. (2005) SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures. Bioinformatics 21, 3615–3621.CrossRefPubMedGoogle Scholar
  15. 15.
    Ward, J. J., et al. (2004) Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. J Mol Biol 337, 635–645.CrossRefPubMedGoogle Scholar
  16. 16.
    Richardson, J. S., Getzoff, E. D., Richardson, D. C. (1978) The beta bulge: a common small unit of nonrepetitive protein structure. Proc Natl Acad Sci USA 75, 2574–2578.CrossRefPubMedGoogle Scholar
  17. 17.
    Chan, A. W., Hutchinson, E. G., Harris, D., Thornton, J. M. (1993) Identification, classification, and analysis of beta-bulges in proteins. Protein Sci 2, 1574–1590.CrossRefPubMedGoogle Scholar
  18. 18.
    Kabsch, W., Sander, C. (1983) How good are predictions of protein secondary structure? FEBS Lett 155, 179–182.CrossRefPubMedGoogle Scholar
  19. 19.
    Nagano, K. (1973) Logical analysis of the mechanism of protein folding. I. Predictions of helices, loops and beta-structures from primary structure. J Mol Biol 75, 401–420.CrossRefPubMedGoogle Scholar
  20. 20.
    Chou, P. Y., Fasman, G. D. (1974) Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins. Biochemistry 13, 211–222.CrossRefPubMedGoogle Scholar
  21. 21.
    Lim, V. I. (1974) Structural principles of the globular organization of protein chains. A stereochemical theory of globular protein secondary structure. J Mol Biol 88, 857–872.CrossRefPubMedGoogle Scholar
  22. 22.
    Schulz, G. E. (1988) A critical evaluation of methods for prediction of protein secondary structures. Ann Rev Biophys Biophys Chem 17, 1–21.CrossRefGoogle Scholar
  23. 23.
    Garnier, J., Osguthorpe, D. J., Robson, B. (1978) Analysis of the accuracy and implications of simple methods for predicting the secondary structure of globular proteins. J Mol Biol 120, 97–120.CrossRefPubMedGoogle Scholar
  24. 24.
    Garnier, J., Gibrat, J. F., Robson, B. (1996) GOR method for predicting protein secondary structure from amino acid sequence. Methods Enzymol 266, 540–553.CrossRefPubMedGoogle Scholar
  25. 25.
    Zvelebil, M. J., Barton, G. J., Taylor, W. R., Sternberg, M. J. (1987) Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J Mol Biol 195, 957–961.CrossRefPubMedGoogle Scholar
  26. 26.
    Levin, J. M., Pascarella, S,, Argos, P., Garnier, J. (1993) Quantification of secondary structure prediction improvement using multiple alignments. Protein Eng 6, 849–854.CrossRefPubMedGoogle Scholar
  27. 27.
    Rost, B., Sander, C. (1993) Prediction of protein secondary structure at better than 70-percent accuracy. J Mol Biol 232, 584–599.CrossRefPubMedGoogle Scholar
  28. 28.
    Qian, N., Sejnowski, T. J. (1988) Predicting the secondary structure of globular-proteins using Neural Network Models. J Mol Biol 202, 865–884.CrossRefPubMedGoogle Scholar
  29. 29.
    Rumelhart, D. E., Hinton, G. E., Williams, R. J. (1986) Learning representations by back-propagating errors. Nature 323, 533–536.CrossRefGoogle Scholar
  30. 30.
    Minsky, M., Papert, S. (1988) Perceptrons. MIT Press, Cambridge, MA, USA.Google Scholar
  31. 31.
    Altschul, S. F., et al. (1990) Basic local alignment search tool. J Mol Biol 215, 403–410.PubMedGoogle Scholar
  32. 32.
    Bairoch, A., Boeckmann, B. (1991) The SWISS-PROT protein sequence data bank. Nucl Acids Res 19(Suppl), 2247–2249.PubMedGoogle Scholar
  33. 33.
    Sander, C., Schneider, R. (1991) Database of homology-derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68.CrossRefPubMedGoogle Scholar
  34. 34.
    Przybylski, D., Rost, B. (2002) Alignments grow, secondary structure prediction improves. Proteins 46, 197–205.CrossRefPubMedGoogle Scholar
  35. 35.
    Altschul, S. F., Koonin, E. V. (1998) Iterated profile searches with PSI-BLAST – a tool for discovery in protein databases. Trends Biochem Sci 23, 444–447.CrossRefPubMedGoogle Scholar
  36. 36.
    Altschul, S. F., et al. (1997) Gapped BLAST and PSI-BLAST, a new generation of protein database search programs. Nucl Acids Res 25, 3389–3402.CrossRefPubMedGoogle Scholar
  37. 37.
    Jones, D. T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292, 195–202.CrossRefPubMedGoogle Scholar
  38. 38.
    Ouali, M., King, R. D. (2000) Cascaded multiple classifiers for secondary structure prediction. Protein Sci 9, 1162–1176.CrossRefPubMedGoogle Scholar
  39. 39.
    Baldi, P., et al. (1999) Exploiting the past and the future in protein secondary structure prediction. Bioinformatics 15, 937–946.CrossRefPubMedGoogle Scholar
  40. 40.
    Pollastri, G., Przybylski, D., Rost, B., Baldi, P. (2002) Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235.CrossRefPubMedGoogle Scholar
  41. 41.
    Pollastri, G., McLysaght, A. (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics 21, 1719–1720.CrossRefPubMedGoogle Scholar
  42. 42.
    Raghava, G. P. S. (2000) in CASP 4, pp. 75–76.Google Scholar
  43. 43.
    Raghava, G. P. S. (2002) in CASP 5, p. 132.Google Scholar
  44. 44.
    Raghava, G. P. S. (2002) in CASP 5, p. 133.Google Scholar
  45. 45.
    Eddy, S. R. (1998) Profile hidden Markov models. Bioinformatics 14, 755–763.CrossRefPubMedGoogle Scholar
  46. 46.
    Karplus, K., Barrett, C., Hughey, R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14, 846–856.CrossRefPubMedGoogle Scholar
  47. 47.
    Karplus, K., et al. (1999) Predicting protein structure using only sequence information. Proteins Suppl 3, 121–125.CrossRefPubMedGoogle Scholar
  48. 48.
    Karplus, K., et al. (2003) Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins 53(Suppl 6), 491–496.CrossRefPubMedGoogle Scholar
  49. 49.
    Shackelford, G., Karplus, K. (2007) Contact prediction using mutual information and neural nets. Proteins 69(Suppl 8), 159–164.CrossRefPubMedGoogle Scholar
  50. 50.
    Lin, K., Simossis, V. A., Taylor, W. R., Heringa, J. (2005) A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21, 152–159.CrossRefPubMedGoogle Scholar
  51. 51.
    Cuff, J. A., et al. (1998) JPred: a consensus secondary structure prediction server. Bioinformatics 14, 892–893.CrossRefPubMedGoogle Scholar
  52. 52.
    Thompson, J. D., et al. (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucl Acids Res 25, 4876–4882.CrossRefPubMedGoogle Scholar
  53. 53.
    Rost, B. (1996) PHD: predicting one-dimensional protein structure by profile-based neural networks. Methods Enzymol 266, 525–539.CrossRefPubMedGoogle Scholar
  54. 54.
    Frishman, D., Argos, P. (1997) Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27, 329–335.CrossRefPubMedGoogle Scholar
  55. 55.
    Salamov, A. A., Solovyev, V. V. (1995) Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignments. J Mol Biol 247, 11–15.CrossRefPubMedGoogle Scholar
  56. 56.
    Cuff, J. A., Barton, G. J. (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins 40, 502–511.CrossRefPubMedGoogle Scholar
  57. 57.
    Cole, C., Barber, J. D., Barton, G. J. (2009) The Jpred 3 secondary structure prediction server. Nucl Acids Res 36, W197–W201.CrossRefPubMedGoogle Scholar
  58. 58.
    Kabsch, W., Sander C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637.CrossRefPubMedGoogle Scholar
  59. 59.
    Andersen, C. A., Palmer, A. G., Brunak, S., Rost, B. (2002) Continuum secondary structure captures protein flexibility. Structure 10, 175–184.CrossRefPubMedGoogle Scholar
  60. 60.
    Heinig, M., Frishman, D. (2004) STRIDE: a web server for secondary structure assignment from known atomic coordinates of proteins. Nucl Acids Res 32, W500–W502.CrossRefPubMedGoogle Scholar
  61. 61.
    Moult, J., et al. (2007) Critical assessment of methods of protein structure prediction-Round VII. Proteins 69(Suppl 8), 3–9.CrossRefPubMedGoogle Scholar
  62. 62.
    Koh, I. Y., et al. (2003) EVA: evaluation of protein structure prediction servers. Nucl Acids Res 31, 3311–3315.CrossRefPubMedGoogle Scholar
  63. 63.
    Tusnady, G. E., Dosztanyi, Z., Simon, I. (2005) PDB_TM: selection and membrane localization of transmembrane proteins in the protein data bank. Nucl Acids Res 33, D275–D278.CrossRefPubMedGoogle Scholar
  64. 64.
    Wallin, E., von Heijne, G. (1998) Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci 7, 1029–1038.CrossRefPubMedGoogle Scholar
  65. 65.
    Tusnady, G. E., Simon, I. (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics 17, 849–850.CrossRefPubMedGoogle Scholar
  66. 66.
    Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E. L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol 305, 567–580.CrossRefPubMedGoogle Scholar
  67. 67.
    Kall, L., Krogh, A., Sonnhammer, E. L. (2004) A combined transmembrane topology and signal peptide prediction method. J Mol Biol 338, 1027–1036.CrossRefPubMedGoogle Scholar
  68. 68.
    Kall, L., Krogh, A., Sonnhammer, E. L. (2005) An HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics 21(Suppl 1), i251–i257.CrossRefPubMedGoogle Scholar
  69. 69.
    Jones, D. T. (2007) Improving the accuracy of transmembrane protein topology prediction using evolutionary information. Bioinformatics 23, 538–544.CrossRefPubMedGoogle Scholar
  70. 70.
    Natt, N. K., Kaur, H., Raghava, G. P. (2004) Prediction of transmembrane regions of beta-barrel proteins using ANN- and SVM-based methods. Proteins 56, 11–18.CrossRefPubMedGoogle Scholar
  71. 71.
    Simossis, V. A., Kleinjung, J., Heringa, J. (2005) Homology-extended sequence alignment. Nucl Acids Res 33, 816–824.CrossRefPubMedGoogle Scholar
  72. 72.
    Pirovano, W., Feenstra, K. A., Heringa, J. (2008) PRALINETM: a strategy for improved multiple alignment of transmembrane proteins. Bioinformatics 24, 492–497.CrossRefPubMedGoogle Scholar
  73. 73.
    Pei, J., Grishin, N. V. (2007) PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802–808.CrossRefPubMedGoogle Scholar
  74. 74.
    Eisenberg, D., Schwarz, E., Komaromy, M., Wall, R. (1984) Analysis of membrane and surface protein sequences with the hydrophobic moment plot. J Mol Biol 179, 125–142.CrossRefPubMedGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Walter Pirovano
    • 1
  • Jaap Heringa
    • 1
  1. 1.Centre for Integrative Bioinformatics VU (IBIVU), VU UniversityAmsterdamThe Netherlands

Personalised recommendations