Genetic Programming for Predicting Protein Networks

  • Beatriz Garcia
  • Ricardo Aler
  • Agapito Ledezma
  • Araceli Sanchis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5290)


One of the definitely unsolved main problems in molecular biology is the protein-protein functional association prediction problem. Genetic Programming (GP) is applied to this domain. GP evolves an expression, equivalent to a binary classifier, which predicts if a given pair of proteins interacts. We take advantages of GP flexibility, particularly, the possibility of defining new operations. In this paper, the missing values problem benefits from the definition of if-unknown, a new operation which is more appropriate to the domain data semantics. Besides, in order to improve the solution size and the computational time, we use the Tarpeian method which controls the bloat effect of GP. According to the obtained results, we have verified the feasibility of using GP in this domain, and the enhancement in the search efficiency and interpretability of solutions due to the Tarpeian method.


Protein interaction prediction genetic programming data integration bioinformatics evolutionary computation machine learning classification control bloat 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rojas, A., Juan, D., Valencia, A.: Molecular interactions: Learning form protein complexes. In: Leon, D., Markel, S. (eds.) Silico Technologies in Drug Target Identification and Validation, vol. 6, pp. 225–244 (2006)Google Scholar
  2. 2.
    Causier, B.: Studying the Interactome with the Yeast Two-Hybrid System and Mass Spectrometry. Mass Spectrom. Rev. 23, 350–367 (2004)CrossRefGoogle Scholar
  3. 3.
    Valencia, A., Pazos, F.: omputational Methods for the Prediction of Protein Interactions. Curr. Opin. Struct. Biol. 12, 368–373 (2002)CrossRefGoogle Scholar
  4. 4.
    Fraser, H.B., Hirsh, A.E., Wall, D.P., et al.: Coevolution of Gene Expression among Interacting Proteins. Proc. Natl. Acad. Sci. U. S. A. 101, 9033–9038 (2004)CrossRefGoogle Scholar
  5. 5.
    Yu, H., Luscombe, N.M., Lu, H.X., et al.: Annotation Transfer between Genomes: Protein-Protein Interologs and Protein-DNA Regulogs. Genome Res. 14, 1107–1118 (2004)CrossRefGoogle Scholar
  6. 6.
    Gómez, M., Alonso-Allende, R., Pazos, F., et al.: Accessible Protein Interaction Data for Network Modeling. Structure of the Information and Available Repositories. Transactions on Computational Systems Biology I, 1–13 (2005)CrossRefGoogle Scholar
  7. 7.
    Mering, C.v., Krause, R., Snel, B., et al.: Comparative Assessment of Large-Scale Data Sets of Protein-Protein Interactions. Nature 417, 399–403 (2002)CrossRefGoogle Scholar
  8. 8.
    Koza, J.: Genetic programming II. MIT Press, Cambridge (1994)zbMATHGoogle Scholar
  9. 9.
    Mahler, S., Robilliard, D., Fonlupt, C.: Tarpeian Bloat Control and Generalization Accuracy. In: Keijzer, M., Tettamanzi, A.G.B., Collet, P., van Hemert, J., Tomassini, M. (eds.) EuroGP 2005. LNCS, vol. 3447, pp. 203–214. Springer, Heidelberg (2005)Google Scholar
  10. 10.
    Poli, R.: A Simple but Theoretically-Motivated Method to Control Bloat in Genetic Programming. In: Ryan, C., Soule, T., Keijzer, M., Tsang, E.P.K., Poli, R., Costa, E. (eds.) EuroGP 2003. LNCS, vol. 2610, pp. 204–217. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Butland, G., Peregrin-Alvarez, J.M., Li, J., et al.: Interaction Network Containing Conserved and Essential Protein Complexes in Escherichia Coli. Nature 433, 531–537 (2005)CrossRefGoogle Scholar
  12. 12.
    Zongker, D., Punch, B.: Lil-Gp Genetic Programming System (1998),
  13. 13.
    Fawcett, T.: ROC Graphs: Notes and Practical Considerations for Data Mining Researchers (2003)Google Scholar
  14. 14.
    Witten, I.H., Frank, E.: Data mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  15. 15.
    Poli, R., Langdon, W., Dignum, S.: On the Limiting Distribution of Program Sizes in Tree-Based Genetic Programming. In: Ebner, M., O’Neill, M., Ekárt, A., Vanneschi, L., Esparcia-Alcázar, A.I. (eds.) EuroGP 2007. LNCS, vol. 4445, pp. 193–204. Springer, Heidelberg (2007)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Beatriz Garcia
    • 1
  • Ricardo Aler
    • 1
  • Agapito Ledezma
    • 1
  • Araceli Sanchis
    • 1
  1. 1.Computer Science DepartmentUniversidad Carlos III de MadridMadridSpain

Personalised recommendations