Machine Learning

, Volume 70, Issue 2–3, pp 225–240 | Cite as

Inductive logic programming for gene regulation prediction

  • Sebastian Fröhler
  • Stefan KramerEmail author


We present a systems biology application of ILP, where the goal is to predict the regulation of a gene under a certain condition from binding site information, the state of regulators, and additional information. In the experiments, the boosted Tilde model is on par with the original model by Middendorf et al. based on alternating decision trees (ADTrees), given the same information. Adding functional categorizations and protein-protein interactions, however, it is possible to improve the performance substantially. We believe that decoding the regulation mechanisms of genes is an exciting new application of learning in logic, requiring data integration from various sources and potentially contributing to a better understanding on a system level.


Inductive logic programming Relational learning Gene regulation Gene expression Systems biology 


  1. Allocco, D. J., Kohane, I. S., & Butte, A. J. (2004). Quantifying the relationship between co-expression and co-regulation and gene function. BMC Bioinformatics, 5(18) Google Scholar
  2. Blockeel, H., & Raedt, L. D. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1–2), 285–297. zbMATHCrossRefMathSciNetGoogle Scholar
  3. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. zbMATHCrossRefGoogle Scholar
  4. Dez, C., & Tollervey, D. (2004). Ribosome synthesis meets the cell cycle. Current Opinion in Microbiology, 7(6), 631–637. CrossRefGoogle Scholar
  5. Freund, Y., & Mason, L. (1999). The alternating decision tree learning algorithm. In Proceedings 16th international conference on machine learning (ICML 1999) (pp. 124–133). Los Altos: Kaufmann. Google Scholar
  6. Fröhler, S. (2006). Machine learning for gene regulation prediction. Diploma thesis, TU München. Google Scholar
  7. Gasch, A. P., Spellman, P. T., Kao, C. M., Carmel-Harel, O., Eisen, M. B., Storz, G., Botstein, D., & Brown, P. O. (2000). Genomic expression programs in the response of yeast cells to environmental changes. Molecular Biology of the Cell, 11(12), 4241–4257. Google Scholar
  8. Jeong, J., Johns, J., Sinclair, C., Park, J., & Rossie, S. (2003). Characterization of Saccharomyces cerevisiae protein Ser/Thr phosphatase T1 and comparison to its mammalian homolog PP5. BMC Cell Biology, 4(3). Google Scholar
  9. Latchman, D. (2005). Gene regulation: a eukaryotic perspective (5th ed.). London: Taylor & Francis. Google Scholar
  10. Mewes, H., Albermann, K., Heumann, K., Liebl, S., & Pfeiffer, F. (1997). MIPS: a database for protein sequences, homology data and yeast genome information. Nucleic Acids Research, 25(1), 28–30. CrossRefGoogle Scholar
  11. Middendorf, M., Kundaje, A., Wiggins, C., Freund, Y., & Leslie, C. (2004). Predicting genetic regulatory response using classification. Bioinformatics, 20(suppl_1), 232–240. CrossRefGoogle Scholar
  12. Ong, I., Page, D., & Santos Costa, V. (2006). Inferring regulatory networks from time series expression data and relational data via inductive logic programming. In Proceedings 16th International Conference on Inductive Logic Programming (ILP 2006), short papers. Google Scholar
  13. Park, H.-O., & Craig, E. A. (1989). Positive and negative regulation of basal expression of a Yeast HSP70 gene. Molecular and Cellular Biology, 9(5), 2025–2033. Google Scholar
  14. Ruepp, A., Zollner, A., Maier, D., Albermann, K., Hani, J., Mokrejs, M., Tetko, I., Guldener, U., Mannhaupt, G., Munsterkotter, M., & Mewes, H. W. (2004). The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Research, 32(18), 5539–5545. CrossRefGoogle Scholar
  15. Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D., & Friedman, N. (2003). Module networks: identifying regulatory modules and their condition—specific regulators from gene expression data. Nature Reviews Genetics, 34(2), 166–167. CrossRefGoogle Scholar
  16. van Helden, J., Andr, B., & Collado-Vides, J. (1998). Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology, 281, 827–842. CrossRefGoogle Scholar
  17. Wiederrecht, G., Seto, D., & Parker, C. (1988). Isolation of the gene encoding the S. cerevisiae heat shock transcription factor. Cell, 54(6), 841–853. CrossRefGoogle Scholar
  18. Wingender, E., Dietze, P., Karas, H., & Knueppel, R. (1996). TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic Acids Research, 24(1), 238–241. CrossRefGoogle Scholar
  19. Witten, I. H., & Frank, E. (2005). Data mining: practical machine learning tools and techniques (2nd ed.). Los Altos: Kaufmann. zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Institut für InformatikTechnische Universität MünchenGarching bei MünchenGermany

Personalised recommendations