Abstract
Prediction of the coordination number (CN) of residues in proteins based solely on protein sequence has recently received renewed attention. At the same time, simplified protein models such as the HP model have been used to understand protein folding and protein structure prediction. These models represent the sequence of a protein using two residue types: hydrophobic and polar, and restrict the residue locations to those of a lattice. The aim of this paper is to compare CN prediction at three levels of abstraction a) 3D Cubic lattice HP model proteins, b) Real proteins represented by their HP sequence and c) Real proteins using residue sequence alone. For the 3D HP lattice model proteins the CN of each residue is simply the number of neighboring residues on the lattice. For the real proteins, we use a recent real-valued definition of CN proposed by Kinjo et al. To perform the predictions we use GAssist, a recent evolutionary computation based machine learning method belonging to the Learning Classifier System (LCS) family. Its performance was compared against some alternative learning techniques. Predictions using the HP sequence representation with only two residue types were only a little worse than those using a full 20 letter amino acid alphabet (64% vs 68% for two state prediction, 45% vs 50% for three state prediction and 30% vs 33% for five state prediction). That HP sequence information alone can result in predictions accuracies that are within 5% of those obtained using full residue type information indicates that hydrophobicity is a key determinant of CN and further justifies studies of simplified models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abe, H., Go, N.: Noninteracting local-structure model of folding and unfolding transition in globular proteins. ii. Application to two-dimensional lattice proteins. Biopolymers 20, 1013–1031 (1981)
Hart, W.E., Istrail, S.: Crystallographical universal approximability: A complexity theory of protein folding algorithms on crystal lattices. Technical Report SAND95- 1294, Sandia National Labs, Albuquerque, NM (1995)
Hinds, D., Levitt, M.: A lattice model for protein structure prediction at low resolution. Proceedings National Academy of Science U.S.A. 89, 2536–2540 (1992)
Hart, W., Istrail, S.: Robust proofs of NP-hardness for protein folding: General lattices and energy potentials. Journal of Computational Biology, 1–20 (1997)
Yue, K., Fiebig, K.M., Thomas, P.D., Sun, C.H., Shakhnovich, E.I., Dill, K.A.: A test of lattice protein folding algorithms. Proc. Natl. Acad. Sci. USA 92, 325–329 (1995)
Escuela, G., Ochoa, G., Krasnogor, N.: Evolving L-systems to capture protein structure native conformations. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 74–84. Springer, Heidelberg (2005)
Krasnogor, N., Pelta, D.: Fuzzy memes in multimeme algorithms: a fuzzyevolutionary hybrid. In: Verdegay, J. (ed.) Fuzzy Sets based Heuristics for Optimization, Springer, Heidelberg (2002)
Krasnogor, N., Hart, W., Smith, J., Pelta, D.: Protein structure prediction with evolutionary algorithms. In: Banzhaf, W., Daida, J., Eiben, A., Garzon, M., Honavar, V., Jakaiela, M., Smith, R. (eds.) GECCO 1999: Proceedings of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann, San Francisco (1999)
Krasnogor, N., Blackburne, B., Burke, E., Hirst, J.: Multimeme algorithms for protein structure prediction. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 769–778 Springer, Heidelberg (2002)
Krasnogor, N., de la Cananl, E., Pelta, D., Marcos, D., Risi, W.: Encoding and crossover mismatch in a molecular design problem. In: Bentley, P. (ed.) AID 1998: Proceedings of the Workshop on Artificial Intelligence in Design 1998 (1998)
Krasnogor, N., Pelta, D., Marcos, D.H., Risi, W.A.: Protein structure prediction as a complex adaptive system. In: Proceedings of Frontiers in Evolutionary Algorithms 1998 (1998)
Kinjo, A.R., Horimoto, K., Nishikawa, K.: Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58, 158–165 (2005)
Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. Journal of Machine Learning Research 4, 575–602 (2003)
Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3, 149–175 (1995)
DeJong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning. Machine Learning 13, 161–188 (1993)
Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)
Bacardit, J.: Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis. Ramon Llull University, Barcelona, Catalonia, Spain (2004)
MacCallum, R.: Striped sheets and protein contact prediction. Bioinformatics 20, 224–231 (2004)
Zhao, Y., Karypis, G.: Prediction of contact maps using support vector machines. In: Proceedings of the IEEE Symposium on BioInformatics and BioEngineering, pp. 26–36. IEEE Computer Society, Los Alamitos (2003)
Altschul, S.F., Madden, T.L., Scher, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Witten, I.H., Frank, E.: Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco (2000)
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Bacardit, J., Goldberg, D.E., Butz, M.V., Llorà , X., Garrell, J.M.: Speeding-up pittsburgh learning classifier systems: Modeling time and accuracy. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1021–1031. Springer, Heidelberg (2004)
Noguchi, T., Matsuda, H., Akiyama, Y.: Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Res. 29, 219–220 (2001)
Sander, C., Schneider, R.: Database of homology-derived protein structures. Proteins 9, 56–68 (1991)
Broome, B., Hecht, M.: Nature disfavors sequences of alternating polar and nonpolar amino acids: implications for amyloidogenesis. J. Mol. Biol. 296, 961–968 (2000)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers, San Francisco (1995)
Miller, R.G.: Simultaneous Statistical Inference. Springer, Heidelberg (1981)
Miller, S., Janin, J., Lesk, A., Chothia, C.: Interior and surface of monomeric proteins. J. Mol. Biol. 196, 641–656 (1987)
Li, T., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity by residue grouping. Protein Eng. 16, 323–330 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N., Blazewicz, J. (2006). From HP Lattice Models to Real Proteins: Coordination Number Prediction Using Learning Classifier Systems. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732242_19
Download citation
DOI: https://doi.org/10.1007/11732242_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33237-4
Online ISBN: 978-3-540-33238-1
eBook Packages: Computer ScienceComputer Science (R0)