Soft Computing

, 13:245 | Cite as

Prediction of topological contacts in proteins using learning classifier systems

  • Michael Stout
  • Jaume Bacardit
  • Jonathan D. Hirst
  • Robert E. Smith
  • Natalio Krasnogor
Focus

Abstract

Evolutionary based data mining techniques are increasingly applied to problems in the bioinformatics domain. We investigate an important aspect of predicting the folded 3D structure of proteins from their unfolded residue sequence using evolutionary based machine learning techniques. Our approach is to predict specific features of residues in folded protein chains, in particular features derived from the Delaunay tessellations, Gabriel graphs and relative neighborhood graphs as well as minimum spanning trees. Several standard machine learning algorithms were compared to a state-of-the-art learning method, a learning classifier system (LCS), that is capable of generating compact and interpretable rule sets. Predictions were performed for various degrees of precision using a range of experimental parameters. Examples of the rules obtained are presented. The LCS produces results with good predictive performance and generates competent yet simple and interpretable classification rules.

References

  1. Angelov B, Sadoc JF, Jullien R, Soyer A, Mornon JP, Chomilier J (2002) Nonatomic solvent-driven voronoi tessellation of proteins: an open tool to analyze protein folds. Proteins 49: 446–456CrossRefGoogle Scholar
  2. Aurenhammer F (1991) Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput Surv 23: 345–405CrossRefGoogle Scholar
  3. Bacardit J (2004) Pittsburgh genetics-based machine learning in the data mining era: representations, generalization, and run-time. Ph.D. thesis, Ramon Llull University, Barcelona, Catalonia, SpainGoogle Scholar
  4. Bacardit J (2005) Analysis of the initialization stage of a pittsburgh approach learning classifier system. In: GECCO 2005: proceedings of the genetic and evolutionary computation conference, vol 2. ACM Press, New York, pp 1843–1850Google Scholar
  5. Bacardit J, Krasnogor N (2006) Empirical evaluation of ensemble techniques for a pitssburgh learning classifier system. In: Proceedings of the 2006 international workshop on learning classifier systemsGoogle Scholar
  6. Bacardit J, Goldberg D, Butz M, Llorà X, Garrell JM (2004) Speeding-up pittsburgh learning classifier systems: modeling time and accuracy. In: Parallel problem solving from nature—PPSN 2004. LNCS, vol 3242. Springer, Heidelberg, pp 1021–1031Google Scholar
  7. Bacardit J, Stout M, Krasnogor N, Hirst JD, Blazewicz J (2006) Coordination number prediction using learning classifier systems: performance and interpretability. In: Proceedings of the 8th annual conference on genetic and evolutionary computation (GECCO ’06). ACM Press, New York, NY, pp 247–254Google Scholar
  8. Baldi P, Pollastri G (2002) A machine-learning strategy for protein analysis. IEEE Intell Syst 17: 28–35Google Scholar
  9. Baldi P, Pollastri G (2003) The principled design of large-scale recursive neural network architectures dag-rnns and the protein structure prediction problem. J Mach Learn Res 4: 575–602CrossRefGoogle Scholar
  10. Barber C, Dobkin D, Huhdanpaa H (1996) . ACM Trans Math Softw 22: 469–483MATHCrossRefMathSciNetGoogle Scholar
  11. Birzele F, Gewehr JE, Csaba G, Zimmer R (2007) Vorolign-fast structural alignment using voronoi contacts. Bioinformatics 23: e205–e211CrossRefGoogle Scholar
  12. Bostick D, Vaisman II (2003) A new topological method to measure protein structure similarity. Biochem Biophys Res Commun 304: 320–325CrossRefGoogle Scholar
  13. Bostick DL, Shen M, Vaisman II (2004) A simple topological representation of protein structure: implications for new, fast, and robust structural classification. Proteins 56: 487–501CrossRefGoogle Scholar
  14. Branden C, Tooze J (1999) Introduction to protein structure, 2nd edn. Garland Publishers, New YorkGoogle Scholar
  15. Cazals F, Proust F, Bahadur RP, Janin J (2006) Revisiting the voronoi description of protein–protein interfaces. Protein Sci 15: 2082–2092CrossRefGoogle Scholar
  16. Cortes J (2006) Characterizing robust coordination algorithms via proximity graphs and set-valued maps. In: American Control Conference 2006, p 6Google Scholar
  17. DeJong KA, Spears WM, Gordon DF (1993) Using genetic algorithms for concept learning. Mach Learn 13: 161–188CrossRefGoogle Scholar
  18. Delaunay B (1934) Sur la sphere vide, izvestia akademii nauk sssr. Otdelenie Matematicheskikh i Estestvennykh Nauk 7Google Scholar
  19. Dupuis F, Sadoc JF, Mornon JP (2004) Protein secondary structure assignment through voronoi tessellation. Proteins 55: 519–528CrossRefGoogle Scholar
  20. Dupuis F, Sadoc JF, Jullien R, Angelov B, Mornon JP (2005) Voro3d: 3d voronoi tessellations applied to protein structures. Bioinformatics 21: 1715–1716CrossRefGoogle Scholar
  21. Erwig M (2001) Inductive graphs and functional graph algorithms. J Funct Program 11: 467–492MATHCrossRefMathSciNetGoogle Scholar
  22. Gore SP, Burke DF, Blundell TL (2005) Provat: a tool for voronoi tessellation analysis of protein structures and complexes. Bioinformatics 21: 3316–3317CrossRefGoogle Scholar
  23. Holland JH (1975) Adaptation in natural and artificial systems. University of Michigan Press, Ann ArborGoogle Scholar
  24. Holland JH, Reitman JS (1978) Cognitive systems based on adaptive algorithms. In: Hayes-Roth D, Waterman F(eds) Pattern-directed inference systems. Academic Press, New York, pp 313–329Google Scholar
  25. Huan J, Bandyopadhyay D, Wang W, Snoeyink J, Prins J, Tropsha A (2005) Comparing graph representations of protein structure for mining family-specific residue-based packing motifs. J Comput Biol 12: 657–671CrossRefGoogle Scholar
  26. Ilyin VA, Abyzov A, Leslin CM (2004) Structural alignment of proteins by a novel topofit method, as a superimposition of common volumes at a topomax point. Protein Sci 13: 1865–1874CrossRefGoogle Scholar
  27. Jaromczyk J, Toussaint G (1992) Relative neighborhood graphs and their relatives. P-IEEE 80: 1502–1517CrossRefGoogle Scholar
  28. Jonassen I, Klose D, Taylor WR (2006) Protein model refinement using structural fragment tessellation. Comput Biol Chem 30: 360–366MATHCrossRefGoogle Scholar
  29. Jones D (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol 292: 195–202CrossRefGoogle Scholar
  30. Kinjo AR, Horimoto K, Nishikawa K (2005) Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 58: 158–165CrossRefGoogle Scholar
  31. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the international joint conference on artificial intelligence, Morgan Kaufmann, pp 1137–1145Google Scholar
  32. Livingstone CD, Barton GJ (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput Appl Biosci 9: 745–756Google Scholar
  33. MacCallum R (2004) Striped sheets and protein contact prediction. Bioinformatics 20: I224–I231CrossRefGoogle Scholar
  34. Miller RG (1981) Simultaneous statistical inference. Springer, New YorkMATHGoogle Scholar
  35. Miller S, Janin J, Lesk AM, Chothia C (1987) Interior and surface of monomeric proteins. J Mol Biol 196: 641–656CrossRefGoogle Scholar
  36. Munson PJ, Singh RK (1997) Statistical significance of hierarchical multi-body potentials based on delaunay tessellation and their application in sequence-structure alignment. Protein Sci 6: 1467–1481CrossRefGoogle Scholar
  37. Noguchi T, Matsuda H, Akiyama Y (2001) Pdb-reprdb: a database of representative protein chains from the protein data bank (pdb). Nucleic Acids Res 29: 219–220CrossRefGoogle Scholar
  38. Orriols A, Bernado-Mansilla E (2005) The class imbalance problem in learning classifier systems: a preliminary study. In: GECCO ’05: proceedings of the 2005 workshops on genetic and evolutionary computation, ACM Press, New York, pp 74–78Google Scholar
  39. Poupon A (2004) Voronoi and voronoi-related tessellations in studies of protein structure and interaction. Curr Opin Struct Biol 14: 233–241CrossRefGoogle Scholar
  40. Preparata FP (1985) Computational geometry: an introduction. In: Preparata FP, Shamos MI(eds) Texts and monographs in computer science. Springer, HeidelbergGoogle Scholar
  41. Punta M, Rost B (2005) Profcon: novel prediction of long-range contacts. Bioinformatics 21: 2960–2968CrossRefGoogle Scholar
  42. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, San FranciscoGoogle Scholar
  43. Richards FM (1974) The interpretation of protein structures: total volume, group volume distributions and packing density. J Mol Biol 82: 1–14CrossRefGoogle Scholar
  44. Rissanen J (1978) Modeling by shortest data description. Automatica 14: 465–471MATHCrossRefGoogle Scholar
  45. Roach J, Sharma S, Kapustina M, Carter CWJ (2005) Structure alignment via delaunay tetrahedralization. Proteins 60: 66–81CrossRefGoogle Scholar
  46. Sander C, Schneider R (1991) Database of homology-derived protein structures. Proteins 9: 56–68CrossRefGoogle Scholar
  47. Singh RK, Tropsha A, Vaisman II (1996) Delaunay tessellation of proteins: four body nearest-neighbor propensities of amino acid residues. J Comput Biol 3: 213–221Google Scholar
  48. Stout M, Bacardit J, Hirst JD, Krasnogor N, Blazewicz J (2006) From hp lattice models to real proteins: coordination number prediction using learning classifier systems. In: Rothlauf F, Branke J, Cagnoni S, Costa E, Cotta C, Drechsler R, Lutton E, Machado P, Moore J, Romero J, Smith G, Squillero G, Takagi H(eds) 4th European workshop on evolutionary computation and machine learning in bioinformatics. Springer, Berlin, pp 208–220Google Scholar
  49. Taylor TJ, Vaisman II (2006) Graph theoretic properties of networks formed by the delaunay tessellation of protein structures. Phys Rev E Stat Nonlin Soft Matter Phys 73: 041925Google Scholar
  50. Taylor T, Rivera M, Wilson G, Vaisman II (2005) New method for protein secondary structure assignment based on a simple topological descriptor. Proteins 60: 513–524CrossRefGoogle Scholar
  51. Toussaint G (1980) The relative neighbourhood graph of a finite planar set. Pattern Recogn 12: 261–268MATHCrossRefMathSciNetGoogle Scholar
  52. Voronoi GF (1908) Nouvelles applications des parametres continus a la theorie de formes quadratiques. J Reine Angew Math 134Google Scholar
  53. Witten IH, Frank E (2000) Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San FranciscoGoogle Scholar
  54. Zimmer R, Wohler M, Thiele R (1998) New scoring schemes for protein fold recognition based on voronoi contacts. Bioinformatics 14: 295–308CrossRefGoogle Scholar
  55. Zhao Y, Karypis G (2003) Prediction of contact maps using support vector machines. In: Proceedings of the IEEE symposium on bioinformatics and bioengineering, IEEE Computer Society, pp 26–36Google Scholar
  56. Zheng W, Cho SJ, Vaisman II, Tropsha A (1997) A new approach to protein fold recognition based on delaunay tessellation of protein structure. Pac Symp Biocomput 2: 486–497Google Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • Michael Stout
    • 1
  • Jaume Bacardit
    • 1
  • Jonathan D. Hirst
    • 2
  • Robert E. Smith
    • 3
  • Natalio Krasnogor
    • 1
  1. 1.Automated Scheduling, Optimization and Planning Research Group, School of Computer Science and ITUniversity of NottinghamNottinghamUK
  2. 2.School of ChemistryUniversity of NottinghamNottinghamUK
  3. 3.Intelligent Systems Group, Computer Science DepartmentUniversity College LondonLondonUK

Personalised recommendations