Skip to main content

From HP Lattice Models to Real Proteins: Coordination Number Prediction Using Learning Classifier Systems

  • Conference paper
Applications of Evolutionary Computing (EvoWorkshops 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3907))

Included in the following conference series:

Abstract

Prediction of the coordination number (CN) of residues in proteins based solely on protein sequence has recently received renewed attention. At the same time, simplified protein models such as the HP model have been used to understand protein folding and protein structure prediction. These models represent the sequence of a protein using two residue types: hydrophobic and polar, and restrict the residue locations to those of a lattice. The aim of this paper is to compare CN prediction at three levels of abstraction a) 3D Cubic lattice HP model proteins, b) Real proteins represented by their HP sequence and c) Real proteins using residue sequence alone. For the 3D HP lattice model proteins the CN of each residue is simply the number of neighboring residues on the lattice. For the real proteins, we use a recent real-valued definition of CN proposed by Kinjo et al. To perform the predictions we use GAssist, a recent evolutionary computation based machine learning method belonging to the Learning Classifier System (LCS) family. Its performance was compared against some alternative learning techniques. Predictions using the HP sequence representation with only two residue types were only a little worse than those using a full 20 letter amino acid alphabet (64% vs 68% for two state prediction, 45% vs 50% for three state prediction and 30% vs 33% for five state prediction). That HP sequence information alone can result in predictions accuracies that are within 5% of those obtained using full residue type information indicates that hydrophobicity is a key determinant of CN and further justifies studies of simplified models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abe, H., Go, N.: Noninteracting local-structure model of folding and unfolding transition in globular proteins. ii. Application to two-dimensional lattice proteins. Biopolymers 20, 1013–1031 (1981)

    Google Scholar 

  2. Hart, W.E., Istrail, S.: Crystallographical universal approximability: A complexity theory of protein folding algorithms on crystal lattices. Technical Report SAND95- 1294, Sandia National Labs, Albuquerque, NM (1995)

    Google Scholar 

  3. Hinds, D., Levitt, M.: A lattice model for protein structure prediction at low resolution. Proceedings National Academy of Science U.S.A. 89, 2536–2540 (1992)

    Article  Google Scholar 

  4. Hart, W., Istrail, S.: Robust proofs of NP-hardness for protein folding: General lattices and energy potentials. Journal of Computational Biology, 1–20 (1997)

    Google Scholar 

  5. Yue, K., Fiebig, K.M., Thomas, P.D., Sun, C.H., Shakhnovich, E.I., Dill, K.A.: A test of lattice protein folding algorithms. Proc. Natl. Acad. Sci. USA 92, 325–329 (1995)

    Article  Google Scholar 

  6. Escuela, G., Ochoa, G., Krasnogor, N.: Evolving L-systems to capture protein structure native conformations. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 74–84. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Krasnogor, N., Pelta, D.: Fuzzy memes in multimeme algorithms: a fuzzyevolutionary hybrid. In: Verdegay, J. (ed.) Fuzzy Sets based Heuristics for Optimization, Springer, Heidelberg (2002)

    Google Scholar 

  8. Krasnogor, N., Hart, W., Smith, J., Pelta, D.: Protein structure prediction with evolutionary algorithms. In: Banzhaf, W., Daida, J., Eiben, A., Garzon, M., Honavar, V., Jakaiela, M., Smith, R. (eds.) GECCO 1999: Proceedings of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann, San Francisco (1999)

    Google Scholar 

  9. Krasnogor, N., Blackburne, B., Burke, E., Hirst, J.: Multimeme algorithms for protein structure prediction. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 769–778 Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  10. Krasnogor, N., de la Cananl, E., Pelta, D., Marcos, D., Risi, W.: Encoding and crossover mismatch in a molecular design problem. In: Bentley, P. (ed.) AID 1998: Proceedings of the Workshop on Artificial Intelligence in Design 1998 (1998)

    Google Scholar 

  11. Krasnogor, N., Pelta, D., Marcos, D.H., Risi, W.A.: Protein structure prediction as a complex adaptive system. In: Proceedings of Frontiers in Evolutionary Algorithms 1998 (1998)

    Google Scholar 

  12. Kinjo, A.R., Horimoto, K., Nishikawa, K.: Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58, 158–165 (2005)

    Article  Google Scholar 

  13. Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. Journal of Machine Learning Research 4, 575–602 (2003)

    Article  Google Scholar 

  14. Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3, 149–175 (1995)

    Article  Google Scholar 

  15. DeJong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning. Machine Learning 13, 161–188 (1993)

    Article  Google Scholar 

  16. Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)

    Google Scholar 

  17. Bacardit, J.: Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis. Ramon Llull University, Barcelona, Catalonia, Spain (2004)

    Google Scholar 

  18. MacCallum, R.: Striped sheets and protein contact prediction. Bioinformatics 20, 224–231 (2004)

    Article  Google Scholar 

  19. Zhao, Y., Karypis, G.: Prediction of contact maps using support vector machines. In: Proceedings of the IEEE Symposium on BioInformatics and BioEngineering, pp. 26–36. IEEE Computer Society, Los Alamitos (2003)

    Chapter  Google Scholar 

  20. Altschul, S.F., Madden, T.L., Scher, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  21. Witten, I.H., Frank, E.: Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  22. Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  23. Bacardit, J., Goldberg, D.E., Butz, M.V., Llorà, X., Garrell, J.M.: Speeding-up pittsburgh learning classifier systems: Modeling time and accuracy. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1021–1031. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  24. Noguchi, T., Matsuda, H., Akiyama, Y.: Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Res. 29, 219–220 (2001)

    Article  Google Scholar 

  25. Sander, C., Schneider, R.: Database of homology-derived protein structures. Proteins 9, 56–68 (1991)

    Article  Google Scholar 

  26. Broome, B., Hecht, M.: Nature disfavors sequences of alternating polar and nonpolar amino acids: implications for amyloidogenesis. J. Mol. Biol. 296, 961–968 (2000)

    Article  Google Scholar 

  27. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  28. John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers, San Francisco (1995)

    Google Scholar 

  29. Miller, R.G.: Simultaneous Statistical Inference. Springer, Heidelberg (1981)

    MATH  Google Scholar 

  30. Miller, S., Janin, J., Lesk, A., Chothia, C.: Interior and surface of monomeric proteins. J. Mol. Biol. 196, 641–656 (1987)

    Article  Google Scholar 

  31. Li, T., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity by residue grouping. Protein Eng. 16, 323–330 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N., Blazewicz, J. (2006). From HP Lattice Models to Real Proteins: Coordination Number Prediction Using Learning Classifier Systems. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732242_19

Download citation

  • DOI: https://doi.org/10.1007/11732242_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-33237-4

  • Online ISBN: 978-3-540-33238-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics