From HP Lattice Models to Real Proteins: Coordination Number Prediction Using Learning Classifier Systems

  • Michael Stout
  • Jaume Bacardit
  • Jonathan D. Hirst
  • Natalio Krasnogor
  • Jacek Blazewicz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3907)


Prediction of the coordination number (CN) of residues in proteins based solely on protein sequence has recently received renewed attention. At the same time, simplified protein models such as the HP model have been used to understand protein folding and protein structure prediction. These models represent the sequence of a protein using two residue types: hydrophobic and polar, and restrict the residue locations to those of a lattice. The aim of this paper is to compare CN prediction at three levels of abstraction a) 3D Cubic lattice HP model proteins, b) Real proteins represented by their HP sequence and c) Real proteins using residue sequence alone. For the 3D HP lattice model proteins the CN of each residue is simply the number of neighboring residues on the lattice. For the real proteins, we use a recent real-valued definition of CN proposed by Kinjo et al. To perform the predictions we use GAssist, a recent evolutionary computation based machine learning method belonging to the Learning Classifier System (LCS) family. Its performance was compared against some alternative learning techniques. Predictions using the HP sequence representation with only two residue types were only a little worse than those using a full 20 letter amino acid alphabet (64% vs 68% for two state prediction, 45% vs 50% for three state prediction and 30% vs 33% for five state prediction). That HP sequence information alone can result in predictions accuracies that are within 5% of those obtained using full residue type information indicates that hydrophobicity is a key determinant of CN and further justifies studies of simplified models.


Window Size Coordination Number State Prediction Protein Structure Prediction Residue Type 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abe, H., Go, N.: Noninteracting local-structure model of folding and unfolding transition in globular proteins. ii. Application to two-dimensional lattice proteins. Biopolymers 20, 1013–1031 (1981)Google Scholar
  2. 2.
    Hart, W.E., Istrail, S.: Crystallographical universal approximability: A complexity theory of protein folding algorithms on crystal lattices. Technical Report SAND95- 1294, Sandia National Labs, Albuquerque, NM (1995)Google Scholar
  3. 3.
    Hinds, D., Levitt, M.: A lattice model for protein structure prediction at low resolution. Proceedings National Academy of Science U.S.A. 89, 2536–2540 (1992)CrossRefGoogle Scholar
  4. 4.
    Hart, W., Istrail, S.: Robust proofs of NP-hardness for protein folding: General lattices and energy potentials. Journal of Computational Biology, 1–20 (1997)Google Scholar
  5. 5.
    Yue, K., Fiebig, K.M., Thomas, P.D., Sun, C.H., Shakhnovich, E.I., Dill, K.A.: A test of lattice protein folding algorithms. Proc. Natl. Acad. Sci. USA 92, 325–329 (1995)CrossRefGoogle Scholar
  6. 6.
    Escuela, G., Ochoa, G., Krasnogor, N.: Evolving L-systems to capture protein structure native conformations. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 74–84. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Krasnogor, N., Pelta, D.: Fuzzy memes in multimeme algorithms: a fuzzyevolutionary hybrid. In: Verdegay, J. (ed.) Fuzzy Sets based Heuristics for Optimization, Springer, Heidelberg (2002)Google Scholar
  8. 8.
    Krasnogor, N., Hart, W., Smith, J., Pelta, D.: Protein structure prediction with evolutionary algorithms. In: Banzhaf, W., Daida, J., Eiben, A., Garzon, M., Honavar, V., Jakaiela, M., Smith, R. (eds.) GECCO 1999: Proceedings of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann, San Francisco (1999)Google Scholar
  9. 9.
    Krasnogor, N., Blackburne, B., Burke, E., Hirst, J.: Multimeme algorithms for protein structure prediction. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 769–778 Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Krasnogor, N., de la Cananl, E., Pelta, D., Marcos, D., Risi, W.: Encoding and crossover mismatch in a molecular design problem. In: Bentley, P. (ed.) AID 1998: Proceedings of the Workshop on Artificial Intelligence in Design 1998 (1998)Google Scholar
  11. 11.
    Krasnogor, N., Pelta, D., Marcos, D.H., Risi, W.A.: Protein structure prediction as a complex adaptive system. In: Proceedings of Frontiers in Evolutionary Algorithms 1998 (1998)Google Scholar
  12. 12.
    Kinjo, A.R., Horimoto, K., Nishikawa, K.: Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58, 158–165 (2005)CrossRefGoogle Scholar
  13. 13.
    Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. Journal of Machine Learning Research 4, 575–602 (2003)CrossRefGoogle Scholar
  14. 14.
    Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3, 149–175 (1995)CrossRefGoogle Scholar
  15. 15.
    DeJong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning. Machine Learning 13, 161–188 (1993)CrossRefGoogle Scholar
  16. 16.
    Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)Google Scholar
  17. 17.
    Bacardit, J.: Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis. Ramon Llull University, Barcelona, Catalonia, Spain (2004)Google Scholar
  18. 18.
    MacCallum, R.: Striped sheets and protein contact prediction. Bioinformatics 20, 224–231 (2004)CrossRefGoogle Scholar
  19. 19.
    Zhao, Y., Karypis, G.: Prediction of contact maps using support vector machines. In: Proceedings of the IEEE Symposium on BioInformatics and BioEngineering, pp. 26–36. IEEE Computer Society, Los Alamitos (2003)CrossRefGoogle Scholar
  20. 20.
    Altschul, S.F., Madden, T.L., Scher, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  21. 21.
    Witten, I.H., Frank, E.: Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco (2000)Google Scholar
  22. 22.
    Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)zbMATHCrossRefGoogle Scholar
  23. 23.
    Bacardit, J., Goldberg, D.E., Butz, M.V., Llorà, X., Garrell, J.M.: Speeding-up pittsburgh learning classifier systems: Modeling time and accuracy. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1021–1031. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  24. 24.
    Noguchi, T., Matsuda, H., Akiyama, Y.: Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Res. 29, 219–220 (2001)CrossRefGoogle Scholar
  25. 25.
    Sander, C., Schneider, R.: Database of homology-derived protein structures. Proteins 9, 56–68 (1991)CrossRefGoogle Scholar
  26. 26.
    Broome, B., Hecht, M.: Nature disfavors sequences of alternating polar and nonpolar amino acids: implications for amyloidogenesis. J. Mol. Biol. 296, 961–968 (2000)CrossRefGoogle Scholar
  27. 27.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  28. 28.
    John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers, San Francisco (1995)Google Scholar
  29. 29.
    Miller, R.G.: Simultaneous Statistical Inference. Springer, Heidelberg (1981)zbMATHGoogle Scholar
  30. 30.
    Miller, S., Janin, J., Lesk, A., Chothia, C.: Interior and surface of monomeric proteins. J. Mol. Biol. 196, 641–656 (1987)CrossRefGoogle Scholar
  31. 31.
    Li, T., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity by residue grouping. Protein Eng. 16, 323–330 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Michael Stout
    • 1
  • Jaume Bacardit
    • 1
  • Jonathan D. Hirst
    • 2
  • Natalio Krasnogor
    • 1
  • Jacek Blazewicz
    • 3
  1. 1.Automated Scheduling, Optimization and Planning research group, School of Computer Science and ITUniversity of NottinghamNottinghamUK
  2. 2.School of ChemistryUniversity of NottinghamNottinghamUK
  3. 3.Institute of Computing SciencePoznan University of TechnologyPoznanPoland

Personalised recommendations