Predicting the Start of Protein α-Helices Using Machine Learning Algorithms

  • Rui Camacho
  • Rita Ferreira
  • Natacha Rosa
  • Vânia Guimarães
  • Nuno A. Fonseca
  • Vítor Santos Costa
  • Miguel de Sousa
  • Alexandre Magalhães
Conference paper
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 74)


Proteins are complex structures synthesised by living organisms. They are actually a fundamental type of molecules and can perform a large number of functions in cell biology. Proteins can assume catalytic roles and accelerate or inhibit chemical reactions in our body. They can assume roles of transportation of smaller molecules, storage, movement, mechanical support, immunity and control of cell growth and differentiation [25]. All of these functions rely on the 3D-structure of the protein. The process of going from a linear sequence of amino acids, that together compose a protein, to the protein’s 3D shape is named protein folding. Anfinsen’s work [29] has proven that primary structure determines the way protein folds. Protein folding is so important that whenever it does not occur correctly it may produce diseases such as Alzheimer’s, Bovine Spongiform Encephalopathy (BSE), usually known as mad cows disease, Creutzfeldt-Jakob (CJD) disease, a Amyotrophic Lateral Sclerosis (ALS), Huntingtons syndrome, Parkinson disease, and other diseases related to cancer.


Secondary Structure Amyotrophic Lateral Sclerosis Bovine Spongiform Encephalopathy Machine Learn Algorithm Inductive Logic Programming 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Salamov, A.A., Solovyev, V.V.: Prediction of protein structure by combining nearest-neighbor algorithms and multiple sequence alignments. J.Mol Biol 247, 11–15 (1995)CrossRefGoogle Scholar
  2. 2.
    Aha, D., Kibler, D.: Instance-based learning algorithms. Machine Learning 6, 37–66 (1991)Google Scholar
  3. 3.
    Rost, B.: Phd: predicting 1d protein structure by profile based neural networks. Enzym. 266, 525–539 (1996)CrossRefGoogle Scholar
  4. 4.
    Blader, M., Zhang, X., Matthews, B.: Structural basis of aminoacid alpha helix propensity. Science 11, 1637–1640 (1993), Google Scholar
  5. 5.
    Breiman, L.: Random forests. Machine Learning 45(2), 5–32 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)zbMATHGoogle Scholar
  7. 7.
    Frishman., D., Argos, P.: Seventy-five percent accuracy in protein secondary structure prediction. Proteins 27, 329–335 (1997)CrossRefGoogle Scholar
  8. 8.
    Jones, T.D.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292, 195–202 (1999)CrossRefGoogle Scholar
  9. 9.
    Kneller, D., Cohen, F.E., Langridge, R.: Improvements in protein secondary structure prediction by an enhanced neural network. Journal of Molecular Biology 216, 441–457 (1990)CrossRefGoogle Scholar
  10. 10.
    Fonseca, N., Camacho, R., aes, A.M.: A study on amino acid pairing at the n- and c-termini of helical segments in proteins. PROTEINS: Structure, Function, and Bioinformatics 70(1), 188–196 (2008)CrossRefGoogle Scholar
  11. 11.
    Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: Proceeding of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, pp. 124–133 (1999)Google Scholar
  12. 12.
    Wang, G., Dunbrack Jr., R.L.: Pisces: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)CrossRefGoogle Scholar
  13. 13.
    Gama, J.: Functional trees. Machine Learning 55(3), 219–250 (2004)zbMATHCrossRefGoogle Scholar
  14. 14.
    Cuff, J.A., Clamp, M.E., Siddiqui, A.S., Finlay, M., Barton, J.G., Sternberg, M.J.E.: Jpred: a consensus secondary structure prediction server. J. Bioinformatics 14(10), 892–893 (1998)CrossRefGoogle Scholar
  15. 15.
    John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann, San Mateo (1995)Google Scholar
  16. 16.
    Richardson, J., Richardson, D.C.: Amino acid preferences for specific locations at the ends of α-helices. Science 240, 1648–1652 (1988)CrossRefGoogle Scholar
  17. 17.
    King, R., Sternberg, M.: A machine learning approach for the protein secondary structure. Journal of Molecular Biology 214, 171–182 (1990)CrossRefGoogle Scholar
  18. 18.
    King, R., Sternberg, M.: Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci. 5, 2298–2310 (1996)CrossRefGoogle Scholar
  19. 19.
    Krittanai, C., Johnson, W.C.: The relative order of helical propensity of amino acids changes with solvent environment. Proteins: Structure, Function, and Genetics 39(2), 132–141 (2000)CrossRefGoogle Scholar
  20. 20.
    Landwehr, N., Hall, M., Frank, E.: Logistic model trees. Machine Learning 95(1-2), 161–205 (2005)CrossRefGoogle Scholar
  21. 21.
    Muggleton, S. (ed.): Inductive Logic Programming. Academic Press, London (1992)zbMATHGoogle Scholar
  22. 22.
    Muggleton, S., Feng, C.: Efficient induction of logic programs. In: Proceedings of the First Conference on Algorithmic Learning Theory, Ohmsha, Tokyo (1990)Google Scholar
  23. 23.
    Qian, N., Sejnowski, T.J.: Predicting the secondary structure of globular proteins using neural network models. Journal of Molecular Biology 202, 865–884 (1988)CrossRefGoogle Scholar
  24. 24.
    Petsko, G.A., Petsko, G.A.: Protein Stucture and Function (Primers in Biology). New Science Press Ltd. (2007)Google Scholar
  25. 25.
    Pietzsch, J.: The importance of protein folding. Horizon Symposia (2009),
  26. 26.
    Chou, P.Y., Fasman, G.D.: Prediction of secondary structure of proteins from their amino acid sequence. Advances in Enzymology and Related Areas of Molecular Biology 47, 45–148 (1978)Google Scholar
  27. 27.
    Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)Google Scholar
  28. 28.
    Saraiva, L., Lopes, L.: Universidade Nova de Lisboa, Instituto de Tecnologia Química e Biológica (2007),
  29. 29.
    Sela, M., White, F.H., Anfinsen, C.B.: Reductive cleavage of disulfide bridges in ribonuclease. Science 125, 691–692 (1957)CrossRefGoogle Scholar
  30. 30.
    Sternberg, M., Lewis, R., King, R., Muggleton, S.: Modelling the structure and function of enzymes by machine learning. Proceedings of the Royal Society of Chemistry: Faraday Discussions 93, 269–280 (1992)CrossRefGoogle Scholar
  31. 31.
    Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Rui Camacho
    • 1
  • Rita Ferreira
    • 1
  • Natacha Rosa
    • 1
  • Vânia Guimarães
    • 1
  • Nuno A. Fonseca
    • 2
  • Vítor Santos Costa
    • 2
    • 3
  • Miguel de Sousa
    • 4
  • Alexandre Magalhães
    • 4
  1. 1.LIAAD & Faculdade de Engenharia da Universidade do PortoPortugal
  2. 2.CRACS-INESC PortoPortugal
  3. 3.DCC-Faculdade de Ciências da Universidade do PortoPortugal
  4. 4.REQUIMTE/Faculdade de Ciências da Universidade do PortoPortugal

Personalised recommendations