A Multi-agent System for Protein Secondary Structure Prediction

  • Giuliano Armano
  • Gianmaria Mancosu
  • Alessandro Orro
  • Eloisa Vargiu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3737)


In this paper, we illustrate a system aimed at predicting protein secondary structures. Our proposal falls in the category of multiple experts, a machine learning technique that –under the assumption of absent or negative correlation in experts’ errors– may outperform monolithic classifier systems. The prediction activity results from the interaction of a population of experts, each integrating genetic and neural technologies. Roughly speaking, an expert of this kind embodies a genetic classifier designed to control the activation of a feedforward artificial neural network. Genetic and neural components (i.e., guard and embedded predictor, respectively) are devoted to perform different tasks and are supplied with different information: Each guard is aimed at (soft-) partitioning the input space, insomuch assuring both the diversity and the specialization of the corresponding embedded predictor, which in turn is devoted to perform the actual prediction. Guards deal with inputs that encode information strictly related with relevant domain knowledge, whereas embedded predictors process other relevant inputs, each consisting of a limited window of residues. To investigate the performance of the proposed approach, a system has been implemented and tested on the RS126 set of proteins. Experimental results point to the potential of the approach.


Secondary Structure Input Space Secondary Structure Prediction Protein Secondary Structure Protein Tertiary Structure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)Google Scholar
  2. 2.
    Altschul, S.F., Madden, T.L., Schaeffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  3. 3.
    Anfinsen, C.B.: Principles that govern the folding of protein chains. Science 181, 223–230 (1973)CrossRefGoogle Scholar
  4. 4.
    Armano, G.: NXCS Experts for Financial Time Series Forecasting. In: Bull, L. (ed.) Applications of Learning Classifier Systems, pp. 68–91. Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000)CrossRefGoogle Scholar
  6. 6.
    Baldi, P., Brunak, S., Frasconi, P., Soda, G., Pollastri, G.: Exploiting the Past and the Future in Protein Secondary Structure Prediction. Bioinformatics 15, 937–946 (1999)CrossRefGoogle Scholar
  7. 7.
    Baldi, P., Brunak, S., Frasconi, P., Pollastri, G., Soda, G.: Bidirectional Dynamics for Protein Secondary Structure Prediction. In: Sun, R., Giles, C.L. (eds.) Sequence Learning: Paradigms, Algorithms, and Applications, pp. 80–104. Springer, Heidelberg (2000)Google Scholar
  8. 8.
    Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research. 28, 235–242 (2000)CrossRefGoogle Scholar
  9. 9.
    Blundell, T.L., Johnson, M.S.: Catching a common fold. Prot. Sci. 2(6), 877–883 (1993)CrossRefGoogle Scholar
  10. 10.
    Boczko, E.M., Brooks, C.L.: First-principles calculation of the folding free energy of a three-helix bundle protein. Science 269(5222), 393–396 (1995)CrossRefGoogle Scholar
  11. 11.
    Bowie, J.U., Luthy, R., Eisenberg, D.: A method to identify protein sequences that fold into a known 3-dimensional structure. Science 253, 164–170 (1991)CrossRefGoogle Scholar
  12. 12.
    Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont (1984)zbMATHGoogle Scholar
  13. 13.
    Breiman, L.: Stacked Regressions. Machine Learning 24, 41–48 (1996)zbMATHMathSciNetGoogle Scholar
  14. 14.
    Cleeremans, A.: Mechanisms of Implicit Learning. In: Connectionist Models of Sequence Processing. MIT Press, Cambridge (1993)Google Scholar
  15. 15.
    Chothia, C., Lesk, A.M.: The relation between the divergence of sequence and structure in proteins. EMBO J. 5, 823–826 (1986)Google Scholar
  16. 16.
    Chothia, C.: One thousand families for the molecular biologist. Nature 357, 543–544 (1992)CrossRefGoogle Scholar
  17. 17.
    Chou, P.Y., Fasman, U.D.: Prediction of protein conformation. Biochem. 13, 211–215 (1974)CrossRefGoogle Scholar
  18. 18.
    Chothia, C.: Proteins – 1000 families for the molecular biologist. Nature 357, 543–544 (1992)CrossRefGoogle Scholar
  19. 19.
    Clark, P., Niblett, T.: The CN2 Induction Algorithm. Machine Learning 3(4), 261–283 (1989)Google Scholar
  20. 20.
    Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. PROTEINS: Structure, Function and Genetics 34, 508–519 (1999)CrossRefGoogle Scholar
  21. 21.
    Dandekar, T., Argos., P.: Folding the main chain of small proteins with the genetic algorithm. J. Mol. Biol. 236, 844–861 (1994)CrossRefGoogle Scholar
  22. 22.
    Covell, D.G.: Folding protein alpha-carbon chains into compact forms by Monte Carlo methods. Proteins 14, 409–420 (1992)CrossRefGoogle Scholar
  23. 23.
    Flockner, H., Braxenthaler, M., Lackner, P., Jaritz, M., Ortner, M., Sippl, M.J.: Progress in fold recognition. Proteins: Struct., Funct., Genet. 23, 376–386 (1995)CrossRefGoogle Scholar
  24. 24.
    Freund, Y., Schapire, R.E.: A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting. Journal of Computer Science and System Sciences 55(1), 119–139 (1997)zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Gething, M.J., Sambrook, J.: Protein folding in the cell. Nature 355, 33–45 (1992)CrossRefGoogle Scholar
  26. 26.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, Reading (1989)zbMATHGoogle Scholar
  27. 27.
    Greer, J.: Comparative modelling methods: application to the family of the mammalian serine proteases. Proteins 7, 317–334 (1990)CrossRefGoogle Scholar
  28. 28.
    Havel, T.F.: Predicting the structure of the flavodoxin from Eschericia coli by homology modeling, distance geometry and molecular dynamics. Mol. Simulation 10, 175–210 (1993)CrossRefGoogle Scholar
  29. 29.
    Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Nat. Acad. Sci., 10915–10919 (1989)Google Scholar
  30. 30.
    Holley, H.L., Karplus, M.: Protein secondary structure prediction with a neural network. Proc. Natl. Acad. Sc., U.S.A. 86, 152–156 (1989)CrossRefGoogle Scholar
  31. 31.
    Hartl, F.U.: Secrets of a double-doughnut. Nature 371, 557–559 (1994)CrossRefGoogle Scholar
  32. 32.
    Higgins, D., Thompson, J., Gibson, T., Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680 (1994)CrossRefGoogle Scholar
  33. 33.
    Hinds, D.A., Levitt, M.: Exploring conformational space with a simple lattice model for protein structure. J. Mol. Biol. 243, 668–682 (1994)CrossRefGoogle Scholar
  34. 34.
    Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor (1975)Google Scholar
  35. 35.
    Holland, J.H.: Adaption. In: Rosen, R., Snell, F.M. (eds.) Progress in Theoretical Biology, vol. 4, pp. 263–293. Academic Press, New York (1976)Google Scholar
  36. 36.
    Holland, J.H.: Escaping Brittleness: The possibilities of General-Purpose Learning Algorithms Applied to Parallel Rule-Based Systems. In: Michalski, R.S., Carbonell, J., Mitchell, M. (eds.) Machine Learning, An Artificial Intelligence Approach, vol. II 20, pp. 593–623. Morgan Kaufmann, San Francisco (1986)Google Scholar
  37. 37.
    Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive Mixtures of Local Experts. Neural Computation 3, 79–87 (1991)CrossRefGoogle Scholar
  38. 38.
    Jones, D.T., Taylor, W.R., Thornton, J.M.: A new approach to protein fold recognition. Nature 358, 86–89 (1992)CrossRefGoogle Scholar
  39. 39.
    Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195–202 (1999)CrossRefGoogle Scholar
  40. 40.
    Jordan, M.I., Jacobs, R.A.: Hierarchies of Adaptive Experts. In: Moody, J., Hanson, S., Lippman, R. (eds.) Advances in Neural Information Processing Systems, vol. 4, pp. 985–993. Morgan Kaufmann, San Francisco (1992)Google Scholar
  41. 41.
    Kanehisa, M.: A multivariate analysis method for discriminating protein secondary structural segments. Prot. Engin. 2, 87–92 (1988)CrossRefGoogle Scholar
  42. 42.
    Krogh, A., Vedelsby, J.: Neural Network Ensembles, Cross Validation, and Active Learning. In: Tesauro, G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 231–238. MIT Press, Cambridge (1995)Google Scholar
  43. 43.
    Lathrop, R.H., Smith, T.F.: Global optimum protein threading with gapped alignment and empirical pair score functions. J. Mol. Biol. 255, 641–665 (1996)CrossRefGoogle Scholar
  44. 44.
    Levitt, M.: Protein folding by constrained energy minimization and molecular dynamics. J. Mol. Biol. 170, 723–764 (1983)CrossRefGoogle Scholar
  45. 45.
    Levitt, M.: A simplified representation of protein conformations for rapid simulation of protein folding. J. Mol. Biol. 104, 59–107 (1976)CrossRefGoogle Scholar
  46. 46.
    Madej, T., Gibrat, J.F., Bryant, S.H.: Threading a database of protein cores. Proteins: Struct., Funct., Genet. 23, 356–369 (1995)CrossRefGoogle Scholar
  47. 47.
    Mitchell, E.M., Artymiuk, P.J., Rice, D.W., Willett, P.: Use of techniques derived from graph theory to compare secondary structure motifs in proteins. J. Mol. Biol. 212, 151–166 (1992)CrossRefGoogle Scholar
  48. 48.
    Orengo, C.A., Jones, D.T., Thornton, J.M.: Protein superfamilies and domain superfolds. Nature 372, 631–634 (1994)CrossRefGoogle Scholar
  49. 49.
    Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)Google Scholar
  50. 50.
    Ptitsyn, O.B., Finkelstein, A.V.: Theory of protein secondary structure and algorithm of its prediction. Biopolymers 22, 15–25 (1983)CrossRefGoogle Scholar
  51. 51.
    Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the Prediction of Protein Secondary Structure in Three and Eight Classes Using Neural Networks and Profiles. Proteins 47, 228–235 (2002)CrossRefGoogle Scholar
  52. 52.
    Riis, S.K., Krogh, A.: Improving prediction of protein secondary structure using structured neural networks and multiple sequence alignments. J. Comp. Biol. 3, 163–183 (1996)CrossRefGoogle Scholar
  53. 53.
    Rivest, R.L.: Learning Decision Lists. Machine Learning 2(3), 229–246 (1987)Google Scholar
  54. 54.
    Robson, B.: Conformational properties of amino acid residues in globular proteins. J. Mol. Biol. 107, 327–356 (1976)CrossRefGoogle Scholar
  55. 55.
    Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599 (1993)CrossRefGoogle Scholar
  56. 56.
    Roterman, I.K., Lambert, M.H., Gibson, K.D., Scheraga, H.A.: A comparison of the charmm, amber and ecepp potentials for peptides. ii. phi-psi maps for n-acetyl alanine n’-methyl amide: comparisons, contrasts and simple experimental tests. J. Biomol. Struct. Dynamics 7, 421–453 (1989)Google Scholar
  57. 57.
    Russell, R.B., Copley, R.R., Barton, G.J.: Protein fold recognition by mapping predicted secondary structures. J. Mol. Biol. 259, 349–365 (1996)CrossRefGoogle Scholar
  58. 58.
    Sali, A.: Modelling mutations and homologous proteins. Curr. Opin. Biotech. 6, 437–451 (1995)CrossRefGoogle Scholar
  59. 59.
    Salamov, A.A., Solovyev, V.V.: Prediction of protein secondary structure by combining nearest-neighbor algorithms and multiple sequence alignment. J. Mol. Biol. 247, 11–15 (1995)CrossRefGoogle Scholar
  60. 60.
    Sanchez, R., Sali, A.: Advances in comparative protein-structure modeling. Curr. Opin. Struct. Biol. 7, 206–214 (1997)CrossRefGoogle Scholar
  61. 61.
    Schapire, E.: A Brief Introduction to Boosting. In: Proc. of the Sixteenth Int. Joint Conference on Artificial Intelligence, pp. 1401–1406 (1999)Google Scholar
  62. 62.
    Skolnick, J., Kolinski, A.: Simulations of the folding of a globular protein. Science 250, 1121–1125 (1990)CrossRefGoogle Scholar
  63. 63.
    Sun, R., Peterson, T.: Multi-agent reinforcement learning: weighting and partitioning. Neural Networks 12(4-5), 127–153 (1999)CrossRefGoogle Scholar
  64. 64.
    Taylor, W.R., Thornton, J.M.: Prediction of super-secondary structure in proteins. Nature 301, 540–542 (1983)CrossRefGoogle Scholar
  65. 65.
    Taylor, W.R., Orengo, C.A.: Protein-structure alignment. J. Mol. Biol. 208, 1–22 (1989)CrossRefGoogle Scholar
  66. 66.
    Unger, R., Harel, D., Wherland, S., Sussman, J.L.: A 3-D building blocks approach to analyzing and predicting structure of proteins. Proteins 5, 355–373 (1989)CrossRefGoogle Scholar
  67. 67.
    Vajda, S., Sippl, M., Novotny, J.: Empirical potentials and functions for protein folding and binding. Curr. Opin. Struct. Biol. 7, 228–228 (1997)CrossRefGoogle Scholar
  68. 68.
    Valiant, L.: A Theory of the Learnable. Communications of the ACM 27, 1134–1142 (1984)zbMATHCrossRefGoogle Scholar
  69. 69.
    Vapnik, V.N.: Statistical Learning Theory. John Wiley and Sons Inc., New York (1998)zbMATHGoogle Scholar
  70. 70.
    Vere, S.A.: Multilevel Counterfactuals for Generalizations of Relational Concepts and Productions. Artificial Intelligence 14(2), 139–164 (1980)zbMATHCrossRefGoogle Scholar
  71. 71.
    Weigend, A.S., Mangeas, M., Srivastava, A.N.: Nonlinear Gated Experts for Time Series: Discovering Regimes and Avoiding Overfitting. Int. Journal of Neural Systems 6, 373–399 (1995)CrossRefGoogle Scholar
  72. 72.
    Wilson, S.W.: Classifier Fitness Based on Accuracy. Evolutionary Computation 3(2), 149–175 (1995)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Giuliano Armano
    • 1
  • Gianmaria Mancosu
    • 2
  • Alessandro Orro
    • 1
  • Eloisa Vargiu
    • 1
  1. 1.University of CagliariCagliariItaly
  2. 2.Shardna Life SciencesCagliariItaly

Personalised recommendations