From HP Lattice Models to Real Proteins: Coordination Number Prediction Using Learning Classifier Systems

Stout, Michael; Bacardit, Jaume; Hirst, Jonathan D.; Krasnogor, Natalio; Blazewicz, Jacek

doi:10.1007/11732242_19

Michael Stout²⁹,
Jaume Bacardit²⁹,
Jonathan D. Hirst³⁰,
Natalio Krasnogor²⁹ &
…
Jacek Blazewicz³¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3907))

Included in the following conference series:

Workshops on Applications of Evolutionary Computation

1586 Accesses
8 Citations

Abstract

Prediction of the coordination number (CN) of residues in proteins based solely on protein sequence has recently received renewed attention. At the same time, simplified protein models such as the HP model have been used to understand protein folding and protein structure prediction. These models represent the sequence of a protein using two residue types: hydrophobic and polar, and restrict the residue locations to those of a lattice. The aim of this paper is to compare CN prediction at three levels of abstraction a) 3D Cubic lattice HP model proteins, b) Real proteins represented by their HP sequence and c) Real proteins using residue sequence alone. For the 3D HP lattice model proteins the CN of each residue is simply the number of neighboring residues on the lattice. For the real proteins, we use a recent real-valued definition of CN proposed by Kinjo et al. To perform the predictions we use GAssist, a recent evolutionary computation based machine learning method belonging to the Learning Classifier System (LCS) family. Its performance was compared against some alternative learning techniques. Predictions using the HP sequence representation with only two residue types were only a little worse than those using a full 20 letter amino acid alphabet (64% vs 68% for two state prediction, 45% vs 50% for three state prediction and 30% vs 33% for five state prediction). That HP sequence information alone can result in predictions accuracies that are within 5% of those obtained using full residue type information indicates that hydrophobicity is a key determinant of CN and further justifies studies of simplified models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abe, H., Go, N.: Noninteracting local-structure model of folding and unfolding transition in globular proteins. ii. Application to two-dimensional lattice proteins. Biopolymers 20, 1013–1031 (1981)
Google Scholar
Hart, W.E., Istrail, S.: Crystallographical universal approximability: A complexity theory of protein folding algorithms on crystal lattices. Technical Report SAND95- 1294, Sandia National Labs, Albuquerque, NM (1995)
Google Scholar
Hinds, D., Levitt, M.: A lattice model for protein structure prediction at low resolution. Proceedings National Academy of Science U.S.A. 89, 2536–2540 (1992)
Article Google Scholar
Hart, W., Istrail, S.: Robust proofs of NP-hardness for protein folding: General lattices and energy potentials. Journal of Computational Biology, 1–20 (1997)
Google Scholar
Yue, K., Fiebig, K.M., Thomas, P.D., Sun, C.H., Shakhnovich, E.I., Dill, K.A.: A test of lattice protein folding algorithms. Proc. Natl. Acad. Sci. USA 92, 325–329 (1995)
Article Google Scholar
Escuela, G., Ochoa, G., Krasnogor, N.: Evolving L-systems to capture protein structure native conformations. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 74–84. Springer, Heidelberg (2005)
Chapter Google Scholar
Krasnogor, N., Pelta, D.: Fuzzy memes in multimeme algorithms: a fuzzyevolutionary hybrid. In: Verdegay, J. (ed.) Fuzzy Sets based Heuristics for Optimization, Springer, Heidelberg (2002)
Google Scholar
Krasnogor, N., Hart, W., Smith, J., Pelta, D.: Protein structure prediction with evolutionary algorithms. In: Banzhaf, W., Daida, J., Eiben, A., Garzon, M., Honavar, V., Jakaiela, M., Smith, R. (eds.) GECCO 1999: Proceedings of the Genetic and Evolutionary Computation Conference. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Krasnogor, N., Blackburne, B., Burke, E., Hirst, J.: Multimeme algorithms for protein structure prediction. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 769–778 Springer, Heidelberg (2002)
Chapter Google Scholar
Krasnogor, N., de la Cananl, E., Pelta, D., Marcos, D., Risi, W.: Encoding and crossover mismatch in a molecular design problem. In: Bentley, P. (ed.) AID 1998: Proceedings of the Workshop on Artificial Intelligence in Design 1998 (1998)
Google Scholar
Krasnogor, N., Pelta, D., Marcos, D.H., Risi, W.A.: Protein structure prediction as a complex adaptive system. In: Proceedings of Frontiers in Evolutionary Algorithms 1998 (1998)
Google Scholar
Kinjo, A.R., Horimoto, K., Nishikawa, K.: Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58, 158–165 (2005)
Article Google Scholar
Baldi, P., Pollastri, G.: The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. Journal of Machine Learning Research 4, 575–602 (2003)
Article Google Scholar
Wilson, S.W.: Classifier fitness based on accuracy. Evolutionary Computation 3, 149–175 (1995)
Article Google Scholar
DeJong, K.A., Spears, W.M., Gordon, D.F.: Using genetic algorithms for concept learning. Machine Learning 13, 161–188 (1993)
Article Google Scholar
Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan Press (1975)
Google Scholar
Bacardit, J.: Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis. Ramon Llull University, Barcelona, Catalonia, Spain (2004)
Google Scholar
MacCallum, R.: Striped sheets and protein contact prediction. Bioinformatics 20, 224–231 (2004)
Article Google Scholar
Zhao, Y., Karypis, G.: Prediction of contact maps using support vector machines. In: Proceedings of the IEEE Symposium on BioInformatics and BioEngineering, pp. 26–36. IEEE Computer Society, Los Alamitos (2003)
Chapter Google Scholar
Altschul, S.F., Madden, T.L., Scher, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Article MATH Google Scholar
Bacardit, J., Goldberg, D.E., Butz, M.V., Llorà, X., Garrell, J.M.: Speeding-up pittsburgh learning classifier systems: Modeling time and accuracy. In: Yao, X., Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E., Tiňo, P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1021–1031. Springer, Heidelberg (2004)
Chapter Google Scholar
Noguchi, T., Matsuda, H., Akiyama, Y.: Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Res. 29, 219–220 (2001)
Article Google Scholar
Sander, C., Schneider, R.: Database of homology-derived protein structures. Proteins 9, 56–68 (1991)
Article Google Scholar
Broome, B., Hecht, M.: Nature disfavors sequences of alternating polar and nonpolar amino acids: implications for amyloidogenesis. J. Mol. Biol. 296, 961–968 (2000)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pp. 338–345. Morgan Kaufmann Publishers, San Francisco (1995)
Google Scholar
Miller, R.G.: Simultaneous Statistical Inference. Springer, Heidelberg (1981)
MATH Google Scholar
Miller, S., Janin, J., Lesk, A., Chothia, C.: Interior and surface of monomeric proteins. J. Mol. Biol. 196, 641–656 (1987)
Article Google Scholar
Li, T., Fan, K., Wang, J., Wang, W.: Reduction of protein sequence complexity by residue grouping. Protein Eng. 16, 323–330 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Automated Scheduling, Optimization and Planning research group, School of Computer Science and IT, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, UK
Michael Stout, Jaume Bacardit & Natalio Krasnogor
School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
Jonathan D. Hirst
Institute of Computing Science, Poznan University of Technology, ul. Piotrowo 3a, Poznan, 60-965, Poland
Jacek Blazewicz

Authors

Michael Stout
View author publications
You can also search for this author in PubMed Google Scholar
Jaume Bacardit
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan D. Hirst
View author publications
You can also search for this author in PubMed Google Scholar
Natalio Krasnogor
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Blazewicz
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Johannes Gutenberg University, Mainz, Germany
Franz Rothlauf
Institute AIFB, University of Karlsruhe, 76128, Karlsruhe, Germany
Jürgen Branke
Dipartimento di Ingegneria dell’Informazione, Università di Parma,
Stefano Cagnoni
Centre of Informatics and Systems of the University of Coimbra,
Ernesto Costa
Dept. LCC, Universidad de Málaga, Spain
Carlos Cotta
Institute of Computer Science, University of Bremen, 28359, Bremen, Germany
Rolf Drechsler
INRIA Saclay - Ile-de-France, Parc Orsay Université, 4, rue Jacques Monod, 91893, ORSAY Cedex, France
Evelyne Lutton
CISUC, Department of Informatics Engineering, University of Coimbra, Polo II of the University of Coimbra, 3030, Coimbra, Portugal
Penousal Machado
Dartmouth College, Lebanon, NH, USA
Jason H. Moore
Universidade de A Coruña, CP 15071, A Coruña, Spain
Juan Romero
School of Computing Sciences, UEA Norwich, University of East Anglia, NR4 7TJ, Norwich, UK
George D. Smith
Dipartimento di Automatica e Informatica, Politecnico di Torino, Italy
Giovanni Squillero
Kyushu University, Japan
Hideyuki Takagi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N., Blazewicz, J. (2006). From HP Lattice Models to Real Proteins: Coordination Number Prediction Using Learning Classifier Systems. In: Rothlauf, F., et al. Applications of Evolutionary Computing. EvoWorkshops 2006. Lecture Notes in Computer Science, vol 3907. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11732242_19

Download citation

DOI: https://doi.org/10.1007/11732242_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33237-4
Online ISBN: 978-3-540-33238-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics